C++ reads .shp file

1. Shape file format

1.1 First look at the shape file
According to ESRI Shapefile Technical Description, the three most important files for a complete Shapefile are ***.shp, **. shx, and .dbf. The meaning of the three are: MainFile (main file), IndexFile (index file), dBaseFile (property file). MainFile (.shp) is used to store the three basic geometric types of points, lines and surfaces. IndexFile (.shx) index file stores the position of the starting point of each record in the corresponding main file. The dBaseFile (.dbf) attribute file stores the attributes of each point.
All files are
stored in binary mode, so when using C++ to read files, you need to add
ios::binary when the ifstream object is initialized
, indicating that the file is opened in binary mode. The opening method is as follows:

ifstream inFile("C:\\Users\\Administrator\\Desktop\\test.shp", ios::binary | ios::in )

1.2 .shp file
Each .shp file is divided into two parts, one part is100 bytesThe header of the file, and the other part is the data content that needs to be used, which is called the data record.
1.2.1 .shp file header
From the first byte of the file to 100 bytes, it is the content of the file header. When the data is read, it is from 0-99, a total of 100 bytes. Just like the index of an array, starting from 0, the file pointers mentioned below are all based on the array index.
From 100 bytes to the end of the file is the content of the data record. As shown in the figure, the first 100 bytes of a .shp file are the file header, and the following contents are data records
For the description of the file header, please refer to the figure below: In the
The leftmost column in the figure is the byte number. For example, the byte number of the filelength in line 7 is 24. If you want to read the content of the filelength, you only need to move the file pointer to 24.
figure above, the rightmost column is the computer byte order mode. After searching for information, the author learned that different computer systems may use different endianness. Big is big endian and Little is little endian. Regarding big-endian and little-endian, I won’t explain much here, just post a code for byte order conversion, which can be used directly:

template<class T>
T ByteTrans(T m)
{
    
    
    //联合体内所有的数据公用一个内存
    union n_
    {
    
    
        T n;
        char mem[sizeof(T)];
    };
    n_ big, small;
    big.n = m;
    for (int i = 0; i < sizeof(T); i++)
    {
    
    
        small.mem[i] = big.mem[sizeof(T) - i - 1];
    }
    return small.n;
}

The most important data in the file header are ShapeType and Box. The ShapeType here is the map type, corresponding to the three geometric elements of point, line and surface. Box is the range of all points recorded in the current .shp file, including the maximum and minimum values ​​of X, the maximum and minimum values ​​of Y (if there are Z and M, and the maximum and minimum values ​​of Z and M, but so far, the author I haven't seen a .shp file with M).
Regarding the ShapeType in the file header, you can see the following figure:
The values ​​of ShapeType corresponding to Point, Polyline and Polygon are 1, 3, and 5 respectively

Then there is the data record of the file.
1.2.2 Data record of .shp file
Starting from the 101st byte of the .shp file, that is, where the file pointer is 100, and continuing to the end of the file, it is the data record of the .shp file.
Each data record also contains two parts. The first part is the content containing two big-endian integers, called the record header. This is a fixed-length, only 8 bytes, and the record header is stored in big-endian order. The other part is variable length, that is, variable length record content. All data is stored in little-endian order.
The first big-endian integer in the record header is the number of the current record, starting from 1. The second big-endian integer is the record length (RecordLength). Fixed-length recording heads are generally useless.
The variable length record content is first the map type (ShapeType), and then the real point record. For the three geometric elements of point, line and surface, their storage methods in the .shp file are different, as shown in the following table:

ShapeType Storage format
1 o'clock) X(double) ,Y(double)
2 (line) double Box[4],int NumParts,int NumPoints ,int Parts[NumParts],double [NumPoints[X,Y]]
3 (face) double Box[4],int NumParts,int NumPoints ,int Parts[NumParts],double [NumPoints[X,Y]]

For points, there are only ShapeType (value=1) and two double values ​​in the record content, which are X and Y coordinates.
For lines and surfaces, in addition to ShapeType, the first double Box[4] refers to the range of all points in the current record, and the storage mode is Xmin, Ymin, Xmax, Ymax.
NumParts is the number of segments of the line (or the number of rings of the surface) in the current record.
NumPoints is the number of points in the current record.
int Parts[NumParts] is the index of the file pointer that stores the starting point of each segment of the line (or ring).
double [NumPoints[X,Y]] is the X and Y coordinates of the point, stored in order, such as X1, Y1, X2, Y2, X3, Y3... until the end of the current record.
1.2.3 Summary
That's it for the introduction of the .shp file.
1.3 .shx file
The .shx file is an index file, including a 100-byte file header and a fixed-length record.
The file header is the same as the .shp file. The fixed-length record is 8 bytes, two int type data. As shown below:
Offset is the offset, which is the position of the first byte of the current record in the file. The ContentLength that follows is the same as the ContentLength corresponding to each record in the .shp file
1.4 .dbf file
I don't know how the format is for the time being, I will update it when I use it later.

2. C++ class design

2.1 Idea
A shapefile is a map type file. A map contains multiple layers, and each layer contains points, lines and polygons. Starting from points, lines and planes, we can know that points make up lines and lines make up surfaces. At the same time, points can be composed of layers, and lines and areas can also be used. As shown in the figure:
Insert picture description here
So points, lines and areas are all objects, Layers are also objects, and Maps composed of multiple layers are also objects.
Therefore, the relationship between the classes comes out.
2.2 Code implementation
Because the three types of point, line and surface have one thing in common, which is to store the coordinates of the point, put it in the file, and have common attributes such as NumParts, NumPoints, and Box. Therefore, before designing the point, line and surface class, a Shape class needs to be abstracted.
*[ HTML ]: All the codes in this article are codes that have been successfully run on QT. They are copied and pasted directly from my own project, so the code should run without any problems.
2.2.1 Shape类
code show as below:

#include <vector>

struct strXY
{
    
    
    double dX;
    double dY;
};

class Shape
{
    
    
public:
    int _vParts[1000];
    int _iNumParts;
    int _iShapeType;
    double _dBox[4];
    std::vector<strXY> _vPoints;

    virtual void toShape(double, double) = 0;
};

2.2.2 Point class
code show as below:

class Point :public Shape
{
    
    
public:
    ~Point();
    void toShape(double, double);
}

2.2.3 LinePolygon class
Since Polyline and Polygon have certain common attributes, these common attributes are the number and range of each record.
code show as below:

#include "shape.h"

class LinePolygon :public Shape
{
    
    
public:
    int _iNumPnts;
    double _dBox[4];

    virtual void toShape(double, double) = 0;
};

2.2.4 Polyline class
code show as below:

#include "linepolygon.h"

class Polyline :public LinePolygon
{
    
    
public:
    void toShape(double, double);
};

2.2.5 Polygon class
code show as below:

#include "linepolygon.h"

class Polygon :public LinePolygon
{
    
    
public:
    void toShape(double, double);
};

2.2.6 Layer class
Each file is a layer, and each layer contains certain geometric elements. When multiple layers are put together, there are some abstract properties, as follows:

    int _iShpTyp;  // 地图类型
    double _dBox[4];  // 边界
    int _iRecnt;  // 记录数
    string _sfilename;  // 文件名
    vector<Shape*> _vShape;  // 记录的集合

And the layer needs to deal with the file, so it also needs to have the action of reading the file, that is, a series of functions for reading the file. as follows:

    bool loadFile(string);
    void readFilehead(ifstream&);
    void toPoint(ifstream& inFile,int iIndex);
    void toPolyline(ifstream& inFile,int iIndex);
    void toPolygon(ifstream& inFile,int iIndex);
    void toPointZ(ifstream& inFile,int iIndex);
    void readRecContent(string);
    int getReCnt(string);

So the entire Layer header file is:

#include <fstream>
#include <vector>
#include "shape.h"
#include "point.h"
#include "polyline.h"
#include "polygon.h"

using namespace std;

class Layer
{
    
    
protected:
    void readFilehead(ifstream&);

    void toPoint(ifstream& inFile, int iIndex);
    void toPolyline(ifstream& inFile, int iIndex);
    void toPolygon(ifstream& inFile, int iIndex);
    void toPointZ(ifstream& inFile, int iIndex);

    void readRecContent(string);
    int getReCnt(string);
public:
    Layer();
    ~Layer();

    bool loadFile(string);

    Point* _pPoint;
    Polyline* _pLine;
    Polygon* _pPolygon;
    int _iShpTyp;  // 地图类型
    double _dBox[4];  // 边界
    int _iRecnt;  // 记录数
    string _sfilename;  // 文件名
    vector<Shape*> _vShape;  // 记录的集合
    double _dRecordBox[50000][4];
};

2.2.7 Map class
A Map object contains many Layers, so like Layer contains many Shapes, the design of the Map class can be compared as follows:

#include <vector>
#include "layer.h"
#include "box.h"
#include <QPoint>
#include <QPolygon>

class Map
{
    
    
public:
    Map();
    Map(string);

    // 自定义函数部分
    void addFile(string);  // 向地图里添加图层
    void addFile(string*, int);  // 重载addFile函数

    // 属性
    double _dBox[4];  // 整个地图里面的Box,是所有的Box求并之后的结果
    std::vector<Layer*> _vMap;  // 用来存储所有的图层

protected:
    Box _box;
    void setBox();
};

2.2.8 Summary
The class syntax is very simple, but how to introduce abstract concepts into actual objects and describe their common points abstractly is very difficult.
As the teacher said, when designing classes, only one point is grasped, that is simple and direct.
There is a long way to go, and C++ spoils programmers' brains.

3. Overall code

Not much to say here, the code directly will be very long, so if you want to see the source code, you can jump to download it .
As for why it is necessary to provide source code, there are some good things that can't be eaten alone by yourself. Open source is not good!

Despise yourself and embrace the open source world, Oye!

Guess you like

Origin blog.csdn.net/GeomasterYi/article/details/106434452
Recommended