[Computer Network] HTTP (Part 1)

1.HTTP concept

The typical protocol of the application layer is HTTP (Hypertext Transfer Protocol). It is the most widely used protocol. Its
function is to pull any content to the local browser and let the browser interpret it.


The client gives its own "things" to others
and at the same time wants to get other people's "things" to its local area. This
is generally called the CS mode.

Web page text, pictures, videos, and audio in http are collectively called resources.
Things are actually resources.

2. URL

To access the server, you must know the server's IP address and port number

A domain name resolution service is required.
For example: baidu.com (domain name) is resolved to 110.242.68.4 (IP address)

Such as: QQ official website

https as protocol
www.qq.com as server address

The port number of the server cannot be specified arbitrarily. It must be well-known and cannot be changed casually.
The port number corresponds to the mature application layer protocol one-to-one.

The commonly used port number for https is 443 and
the commonly used port number for http is 80


There is a one-to-one relationship between the protocol name and the port number.
For example, if there is a fire nearby, the first thing that comes to mind is to call 119 to put out the fire.

Since http is a hypertext transfer protocol, you need to tell others what resources they want to access.


The first / represents the web root directory
and the second / represents the path separator
// represents what resource the URL wants to access on the server.

? Indicates the delimiter that distinguishes the left and right sides of the URL
? What follows are parameters

The parameter is KV, = uid on the left can be regarded as K, = 1 on the right can be regarded as V

URL is called Uniform Resource Locator

urlencode and urldecode

Insert image description here

  • Only search for ?/#: These special symbols found that the special symbols are converted into hexadecimal format numbers.
    Because the URL itself uses some characters as special characters, when using special characters, the special symbols are converted into hexadecimal format numbers. Use To distinguish it from the special characters of the URL itself.
    The conversion process is called URL encoding, which is used to solve the problem of special symbols in the URL. This work is automatically done by the browser or client.
  • What the server receives is in hexadecimal format. If you don’t want hexadecimal format, if you want special symbols, you need to decode it.

Escape rules

Convert the characters that need to be transcoded into hexadecimal, then from right to left, take 4 digits (less than 4 digits are processed directly), make one for every 2 digits, add % in front, and encode it into %XY format


Click to view: Automatic coding tools

Urlencode and urlencode decoding can be performed on this website

3. Macro understanding of HTTP

HTTP request

According to the complete statement, HTTP is divided into four parts


The first part - request line
HTTP request line, in units of lines, is divided into three parts. Request method URL Protocol version
Request method: GET/ POST
URL: Request resource
protocol version: http/1.0 http/1.1 http/2.0
Between the three parts Use spaces as separators to separate these three parts.


Part 2 - Request header
A multi-line structure composed of Key:Value


Part 3 - Blank line
\r\n


Part 4 - Payload
is generally the parameters that the user may submit (optional)

HTTP response

The status line is divided into protocol version status code and status code description.
Use spaces as separators between the three parts to separate the three parts.
Protocol version: http/1.0 http/1.1 http/2.0
status code: Such as 404
status code description: 404 The corresponding meaning is: Not Found


The response header is also a multi-line structure composed of Key:Value


The payload may be an html/css file resource, or it may be a corresponding image requested, etc.

4. Meet HTTP requests and responses

request header

When entering the host IP + port number from the browser, the following data is displayed on Linux

GET/HTTP/1.1
first line as request line


A multi-line structure composed of Key Value is used as a request header
and does not contain a payload.
Host indicates which host the request is sent to, usually the IP address and port number of the target server.
Connection indicates the link mode of this request. Long/short link
Cache- control indicates that both parties need to establish a cache when communicating. The maximum cache survival time defaults to 0 (no caching).
User_Agent indicates the client information of the HTTP request.
Accept_Encodong indicates the encoding and compression type that can be accepted as a client.
Accept_Language indicates that as a client, it can Accept encoding symbols

1. Simulate a simple response

Create a Main.cc and implement the entire process by calling the callback function HandlerHttp


For the callback function HandlerHttp, on the premise that it is a complete http request message, the status line delimiter payload is added to the response, and the response is returned. The payload part is presented in the web page part
.

response header

When performing text analysis, the reading is divided by line until a line is found that is a blank line, then the header is considered to be read.

The key in the header is Content-Length, and the Value is the length of the Body (the length of the payload)


When running the program on Linux and entering the port number,
entering the host IP + port number on the browser will cause the main function to call the callback function to print this a test.
At the same time, Linux will have the following data response status line response header empty line payload


Since the payload is internally divided into image, video, and audio resources,
in order to facilitate differentiation, use Content_Type: the type of Body.


Pictures, videos, and audio resources are essentially files.

The suffix of pictures is .png,
the suffix of web pages is .html,
and the suffix of videos is .mp3.
Linux resources all have their own suffixes. If you need to tell others, you need a Content-Type comparison table.


If the suffix is ​​.html, the Content-Type comparison table is text/html.
If the suffix is ​​.png, the Content-Type comparison table is image/png.


Add the Content-Type comparison table text/html of the web page and the SEP delimiter after the response.

2. Get content from path

Maintain its own directory for http, that is, wwroot
creates index.html and puts all the resources in this web page.


Create Until.hpp.
In the Until class, create an interface ReadFile to read the entire file content.

The first parameter path is the specified path and
the second parameter file_content indicates that the output is the content corresponding to the file.


Path represents the path. Get the file in index.html in the wwwroot directory
and give the obtained file to the string body.

Implementation of ReadFile function

1. To get the size of the file itself
, enter man 2 stat

For the specified file path, obtain its struct stat attribute.
Returns 0 on success and -1 on failure.


st_size indicates the size of this file in bytes
st_mode: matches many macros



2. Adjust the space of string to ensure that all files can be put down

Create size space


3. Read

O_RDONLY read


As path, you can find the content corresponding to index.html, and then pass the content to the body string as the payload.

3. Distinguish between different resources

There are only requests and no brainless responses to these resources.

If different resources are requested, they should be distinguished. Give
the user whatever he wants. If not, return 404.


Process the request, deserialize it, and convert the string information into a structured field.
Create an HttpRequest structure,
which contains the request method, URL, request version, and request header of the status line.


Insert image description here


URL is used as the requested resource, so replace path with req.url_

Deserialization implementation

In the main function Main.cc ,
create the ReadOneLine function to take out the request line of the first line in the message.
Create the ParseRequestLine function to parse the request line into the request method, URL, and protocol version.

Both functions are implemented in Util.hpp

Implementation of ReadOneLine function

The static modification is added to prevent the existence of hidden this pointers.
Use the find function to find the sep separator. If found, return the subscript of the pos position.
Use the substr function to take out the substring in the [0, pos] interval as the return value.
Use the erase function. Delete pos+sep.size() characters starting from 0


Implementation of ParseRequestLine function

The sstream stream uses spaces as delimiters and prints them into three strings.

The final representation of path path

The path path needs to add the web root directory

So define a web root directory webRoot


When using requests, first add the web root directory to the path, and then add the corresponding URL (request resource)

4. Display text and pictures at the same time

Click to view: Pomegranate flower pictures


Create the image file in wwwroot and enter inmage


wget: command to obtain resources remotely

Use wget + image address to get the image


Use the mv command to change the original image name to 1.jpg


At this time, the image can be displayed in the image file in vscode


A web page contains many element resources, such as: pictures, text and videos.
Each resource must initiate an http request.

Search w3cschool in your browser


In the HTML tutorial, find the HTML image, which looks for the replace text attribute


The first / represents the web root directory, that is, wwwroot.
Find 1.jpg in the image file in the wwroot directory.

If the image acquisition fails, the text "This is a picture of pomegranate flowers" will be displayed.


Since this resource contains both text and images, the types are different and Content-Type (body type) needs to be processed.

Add member variables to determine what resources you want to access (such as pictures and text)


Use the rfind function in the deserialization function to search for characters from back to front. Then use the substr function to take len characters starting from the subscript pos.
If len is not given, it will be taken until the end of the path_string.


In the request to use the HandlerHttp function,
the Content-Type (body type) is encapsulated into a GetContentType interface .

Implementation of GetContentType function

If the suffix is ​​.html, the Content-Type comparison table is text/html.
If the suffix is ​​.css, the Content-Type comparison table is test/css.
If the suffix is ​​.js, the Content-Type comparison table is application/x-javascript.
If the suffix is ​​.png, the Content-Type comparison table is image/png.
If the suffix is ​​.jpg, the Content-Type comparison table is image/jpeg.


Enter the host IP + port number in the browser and find that the image is not displayed and garbled characters appear.


The web page must specify the encoding format, otherwise garbled characters will appear,
so modify the content of index.html


Enter the host IP and port number again to display text and images at the same time.

5. Complete code of simulation

wwwroot

index.html(picture)

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charest="UTF-8">
    <meta name="viewport" content="width=device-width" ,initial-scale=1.0">
    <title>Document</title>
</head>

<body>
    <h1>this is a test </h1>
    <h1>this is a test </h1>
    <h1>this is a test </h1>
    <h1>this is a test </h1>
    <img src="/image/1.jpg" alt="这是一张石榴花图片">
</body>

</html>  

Err.hpp (error)

#pragma once 

enum
{
    
    
  USAGE_ERR=1,
  SOCKET_ERR,//2
  BIND_ERR,//3
  LISTEN_ERR,//4
  SETSID_ERR,//5
  OPEN_ERR//6
};




HttpServer.hpp (initialization and startup)



#include<iostream>
#include<string>
#include<pthread.h>
#include<functional>  
#include"Sock.hpp"

static const uint16_t  defaultport=8888;//默认端口号

class HttpServer;
//定义 func_t 类型  为 返回值为string 参数为string的包装器
using func_t =std::function<std::string( std::string&)>;


   class ThreadData
   {
    
    
    public:
     ThreadData(int sock,std::string ip,const uint16_t& port,HttpServer*tsvrp)//构造
     :_sock(sock),_ip(ip),_port(port),_tsvrp(tsvrp)
     {
    
    }
     ~ThreadData()
     {
    
    }
    public:
    int _sock;//套接字
    HttpServer *_tsvrp;//指针指向Tcp服务器 
    std::string _ip;
    uint16_t   _port;
    };

class HttpServer
{
    
    
  public:
  HttpServer(func_t f,int port= defaultport)
  :func(f),port_(port)
  {
    
    }

   void InitServer()//初始化
   {
    
    
      listensock_.Socket();//创建套接字
      listensock_.Bind(port_);//绑定
      listensock_.Listen();//监听
       
   }


   void HandlerHttpRequest(int sock)//
   {
    
    
     char buffer[4096];
    std::string request;

    //将套接字的数据读取到buffer中
    ssize_t s=recv(sock,buffer,sizeof(buffer)-1,0);
     if(s>0)//读取成功
     {
    
    
        buffer[s]=0;//将'\0'赋值给buffer中
        request=buffer;
        std::string response =func(request);//回调函数 将request变为response
        send(sock,response.c_str(),response.size(),0);//发送 将respnse中的内容 发送到sock套接字中
     }
     else 
     {
    
    
        //读取失败
        logMessage(Info,"client quit ...");//打印日志 
     }
   } 
   
   static void* threadRoutine(void *args)
   {
    
    
    //线程分离 若不关心线程返回值 则提前告诉它 要进行分离
    pthread_detach(pthread_self());
    ThreadData* td=(ThreadData*)args;
    td->_tsvrp->HandlerHttpRequest(td->_sock);
    close(td->_sock);

    delete td;
    return nullptr;
   }



   void Start()//启动
   {
    
    
            for(;;)
            {
    
    
                std::string clientip;
                uint16_t clientport;
                int sock=listensock_.Accept(&clientip,&clientport);//获取客户端IP和端口号
                if(sock<0)
                {
    
    
                    continue;
                }
                pthread_t tid;
                ThreadData *td =new ThreadData(sock,clientip,clientport,this);
                pthread_create(&tid,nullptr,threadRoutine,td);
            }
   }

  ~HttpServer()
  {
    
    }
  private:
  int port_;       //端口号
  Sock listensock_;//套接字
  func_t func; //包装器类型的回调函数
};


Log.hpp (log)

#pragma once 
#include<iostream>
#include<string.h>
#include<cstdio>
#include<cstring>
#include<cstdarg>
#include<unistd.h>
#include<sys/types.h>
#include<time.h>

const std::string  filename="tecpserver.log";

//日志等级
enum{
    
    
 Debug=0, // 用于调试
 Info  ,  //1 常规
 Warning, //2 告警
 Error ,  //3  一般错误
 Tatal ,  //4 致命错误
 Uknown//未知错误
};

static  std::string tolevelstring(int level)//将数字转化为字符串
{
    
    
  switch(level)
  {
    
    
     case  Debug : return "Debug";
     case Info : return "Info";
     case Warning : return "Warning";
     case  Error : return "Error";
     case Tatal : return "Tatal";
     default: return "Uknown";
  }
}
std::string gettime()//获取时间
{
    
    
   time_t curr=time(nullptr);//获取time_t
   struct tm *tmp=localtime(&curr);//将time_t 转换为 struct tm结构体
   char buffer[128];
   snprintf(buffer,sizeof(buffer),"%d-%d-%d %d:%d:%d",tmp->tm_year+1900,tmp->tm_mon+1,tmp->tm_mday,
   tmp->tm_hour,tmp->tm_min,tmp->tm_sec);
   return buffer;

}
void logMessage(int level, const char*format,...)
{
    
    
   //日志左边部分的实现
   char logLeft[1024];
   std::string level_string=tolevelstring(level);
   std::string curr_time=gettime();
   snprintf(logLeft,sizeof(logLeft),"%s %s %d",level_string.c_str(),curr_time.c_str());

   //日志右边部分的实现
   char logRight[1024]; 
   va_list p;//p可以看作是1字节的指针
   va_start(p,format);//将p指向最开始
   vsnprintf(logRight,sizeof(logRight),format,p);
   va_end(p);//将指针置空
   
   //打印日志 
   printf("%s%s\n",logLeft,logRight);

   //保存到文件中
   FILE*fp=fopen( filename.c_str(),"a");//以追加的方式 将filename文件打开
   //fopen打开失败 返回空指针
   if(fp==nullptr)
   {
    
    
      return;
   }
   fprintf(fp,"%s%s\n",logLeft,logRight);//将对应的信息格式化到流中
   fflush(fp);//刷新缓冲区
   fclose(fp);
}


Main.cc (callback function call)



#include<vector>
#include<memory>
#include"HttpServer.hpp"
#include"Util.hpp"

using namespace std;
const std::string SEP="\r\n";

const std::string defaultHomePage ="index.html";//默认首页
const std::string webRoot="./wwwroot";//web根目录

class HttpRequest
{
    
    
public:
     HttpRequest()
     :path_(webRoot)
     {
    
    }

    ~HttpRequest()
     {
    
    }

     void Print()
     {
    
    
       logMessage(Debug,"method:%s,url:%s,version:%s",method_.c_str(),url_.c_str(),httpVersion_.c_str());
       /*for(const auto&line:body_)
       {
         logMessage(Debug,"-%s",line.c_str());
       }
       */
       logMessage(Debug,"path:%s",path_.c_str());
        logMessage(Debug,"suffix:%s",suffix_.c_str());
     }
public:
    std::string method_;//请求方法
    std::string url_;   //URL
    std::string httpVersion_;//请求版本
    std::vector<std::string> body_;//请求报头

    std::string path_; //想要访问的资源

    std::string suffix_;//后缀 用于判断访问是什么资源
};

//反序列化 将字符串转化为 HttpRequest结构体
HttpRequest Deserialize(std::string &message)
{
    
    
   HttpRequest req;
   std::string line=Util::ReadOneLine(message,SEP);//在message中根据分隔符读走状态行

   //将请求行分为 请求方法 URL 协议版本
   Util::ParseRequestLine(line,&req.method_,&req.url_,&req.httpVersion_);//解析请求行

   logMessage(Info,"method:%s,url:%s,version:%s",req.method_.c_str(),req.url_.c_str(),req.httpVersion_.c_str());

   //将状态行处理后,剩余请求报头,每一次取一行 将其放入body中
   while(!message.empty())
   {
    
    
      line=Util::ReadOneLine(message,SEP);
      req.body_.push_back(line);
   }

   req.path_ += req.url_;   //path_在构造时,已经默认为web根目录了,所以只需加上资源即可

   //只有一个'/',需加上默认首页
   if(req.path_[req.path_.size()-1]=='/')
   {
    
    
      req.path_+= defaultHomePage;
   }

   auto pos=req.path_.rfind(".");
   if( pos==std::string::npos)//没找到
   {
    
     
      req.suffix_=".html";//默认为html
   }
   else 
   {
    
    
      req.suffix_=req.path_.substr(pos);
   }
   return req;
}


std::string GetContentType(std::string &suffix)//判断是哪一种资源的后缀
{
    
    
    std::string content_type =" Content-Type: ";
    if(suffix==".html"|| suffix==".htm")
    {
    
    
      content_type+="text/html";
    }
    else if(suffix==".css ")
    {
    
    
      content_type+="text/css";
    }
    else if(suffix==".js")
    {
    
    
      content_type+="application/x-javascript";
    }
    else if(suffix==".png")
    {
    
    
       content_type+="image/png";
    }
    else if(suffix==".jpg")
    {
    
    
       content_type+="image/jpeg";
    }
    else 
    {
    
    }
    return content_type+SEP;
}
std::string HandlerHttp( std::string &message)//回调函数的实现
{
    
    
   //1.读取请求
    //request 一定是一个完整的http请求报文
    //给别人返回的 http response
    cout<<"---------------------------"<<endl;
    
    //2.反序列化和分析请求
      HttpRequest req =  Deserialize(message);
      req.Print();
    

    //3.使用请求
    std::string body;//有效载荷

    Util::ReadFile(req.path_,&body);//将path路径中的内容交给body字符串中

    //做一次响应 
    //状态行 : 协议版本 状态码 状态码描述
    //200表示请求是正确的
    std::string response="HTTP/1.0 200 OK"+SEP;//状态码

    //Content-Length获取有效载荷长度
    response+="Content-Length: "+std::to_string(body.size())+SEP;//响应报头
    response+=GetContentType(req.suffix_);

    response += SEP;  //分隔符
    response += body; //有效载荷   
    return response;
}
 int main(int argc,char* argv[])
 {
    
      
   if(argc!=2)
   {
    
    
      exit(USAGE_ERR);
   }
  uint16_t port=atoi(argv[1]);
  std::unique_ptr<HttpServer> tsvr(new HttpServer(HandlerHttp,port));
  tsvr->InitServer();
  tsvr->Start();
    return 0; 
 }
 






makefile

	httserver:Main.cc
	g++ -o $@ $^ -std=c++11 -lpthread
.PHONY:clean
clean:
	rm -f httserver

Sock.hpp (TCP socket)



#include<iostream>
#include<cstring>
#include<cstdlib>
#include<netinet/in.h>
#include<arpa/inet.h>
#include<sys/socket.h>
#include<unistd.h>
#include"Log.hpp"
#include"Err.hpp"

static const int  gbacklog=32;
static const int defaultfd=-1;
class Sock
{
    
    
 public:
 Sock() //构造
 :_sock(defaultfd)
 {
    
    
 }

 void  Socket()//创建套接字
 {
    
    
  _sock=socket(AF_INET,SOCK_STREAM,0);
  if(_sock<0)//套接字创建失败
  {
    
    
    logMessage( Tatal,"socket error,code:%s,errstring:%s",errno,strerror(errno));
    exit(SOCKET_ERR);
  }
 }

  void Bind(uint16_t port)//绑定
  {
    
    
   struct sockaddr_in local;
   memset(&local,0,sizeof(local));//清空
   local.sin_family=AF_INET;//16位地址类型
   local.sin_port= htons(port); //端口号
   local.sin_addr.s_addr= INADDR_ANY;//IP地址
   
   //若小于0,则绑定失败
   if(bind(_sock,(struct sockaddr*)&local,sizeof(local))<0)
   {
    
    
      logMessage( Tatal,"bind error,code:%s,errstring:%s",errno,strerror(errno));
      exit(BIND_ERR);
   }
  }
   
   void Listen()//将套接字设置为监听状态
   {
    
    
      //小于0则监听失败
      if(listen(_sock,gbacklog)<0)
      {
    
    
        logMessage( Tatal,"listen error,code:%s,errstring:%s",errno,strerror(errno));
        exit(LISTEN_ERR);
      }
   }

   int Accept(std::string *clientip,uint16_t * clientport)//获取连接
   {
    
    
        struct sockaddr_in temp;
        socklen_t len=sizeof(temp);
        int sock=accept(_sock,(struct sockaddr*)&temp,&len);

        if(sock<0)
        {
    
    
             logMessage(Warning,"accept error,code:%s,errstring:%s",errno,strerror(errno));
        }
        else 
        {
    
    
            //inet_ntoa 4字节风格IP转化为字符串风格IP
            *clientip = inet_ntoa(temp.sin_addr) ; //客户端IP地址
            //ntohs 网络序列转主机序列
            *clientport= ntohs(temp.sin_port);//客户端的端口号
            

        }
        return sock;//返回新获取的套接字
   }

   int Connect(const std::string&serverip,const uint16_t &serverport )//发起链接
   {
    
    
      struct sockaddr_in server;
      memset(&server,0,sizeof(server));//清空
      server.sin_family=AF_INET;//16位地址类型
      server.sin_port=htons(serverport);//端口号
      //inet_addr  字符串风格IP转化为4字节风格IP
      server.sin_addr.s_addr=inet_addr(serverip.c_str());//IP地址
      //成功返回0,失败返回-1
      return  connect(_sock, (struct sockaddr*)&server,sizeof(server));
    
    }

    int Fd()
    {
    
    
      return _sock;
    }
    void Close()
    {
    
    
      if(_sock!=defaultfd)
     {
    
    
       close(_sock);
     }

    }
    
 ~Sock()//析构
 {
    
    
    
 }
 private:
 int _sock;

};

Until.hpp

#pragma once
#include<iostream>
#include<string>
#include<sys/types.h>
#include<sys/stat.h>
#include<unistd.h>
#include<fcntl.h>
#include<sstream>
#include"Log.hpp"

class Util
{
    
    
 
public:


 static bool  ReadFile(const std::string &path,std::string *fileContent  )//读取整个文件内容
 {
    
    
    //1.获取文件本身的大小
     struct stat st;//定义一个struct stat 类型的结构体
     int n=stat(path.c_str(),&st);
     if(n<0)//读取失败
     {
    
    
        return false;
     }
      int size = st.st_size;


     //2.调整string的空间
     fileContent->resize(size); 

     //3.读取
     int fd=open(path.c_str(),O_RDONLY);
     if(fd<0)//读取失败
     {
    
    
         return false;
     }
     read(fd,(char*)fileContent->c_str(),size);//从文件fd中读取,放到fileContent
      close(fd);
      logMessage( Info,"read file %s done ",path.c_str());
      return true;
 }

//在message中根据分隔符取出状态行
 static std::string ReadOneLine( std:: string &message,const std::string &sep)
 {
    
    
     auto pos=message.find(sep);//查找sep分隔符,找到则返回pos位置的下标
     while(pos==std::string::npos)//没找到
     {
    
    
        return "";
     }
     std::string s=message.substr(0,pos);//取[0,pos]区间作为子串
     message.erase(0,pos+sep.size());从下标为0处开始 删除pos+sep.size()个字符
     return s;
 }

   //将请求行分=解析为 请求方法 URL 协议版本
   static bool ParseRequestLine(const std::string &line,std::string * method,std::string *url,std::string *httpVersion)
   {
    
    
      //以空格为单位,对内容做提取
         std::stringstream ss(line);

         ss >> *method >> *url >> *httpVersion;
         return true;
   }
 
};




Guess you like

Origin blog.csdn.net/qq_62939852/article/details/132650575
Recommended