libtorch 常用api函数示例（史上最全、最详细）

pytorch/libtorch qq群： 1041467052

其实pytorch的函数libtorch都有，只是写法上有些出入。
libtorch的官方文档链接
 class tensor

只是官方文档只是类似与函数申明，没有告诉干嘛的，只能通过函数名字猜了。比如我要一个一个函数和已知的一个torch::Tensor变量形状一样，只是填充指定的数值，我记得在哪里看到过的有个full开头的函数，然后我就搜素full，然后找到一个函数full_like好像是我需要的。（见0）

调试技巧：

torch::Tensor box_1 = torch::rand({5,4});
std::cout<<box_1<<std::endl; //可以打印出数值
box_1.print();//可以打印形状

CMakeLists.txt

cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(main)
SET(CMAKE_BUILD_TYPE "Debug")

set(CMAKE_PREFIX_PATH "/data_2/everyday/0429/pytorch/torch")
find_package(Torch REQUIRED)

set(CMAKE_PREFIX_PATH "/home/yhl/software_install/opencv3.2")
find_package(OpenCV REQUIRED)

add_executable(main main.cpp)
target_link_libraries(main "${TORCH_LIBRARIES}")

target_link_libraries(main ${OpenCV_LIBS})
set_property(TARGET main PROPERTY CXX_STANDARD 11)

0.torch::full_like

static Tensor at::full_like(const Tensor &self, Scalar fill_value, const TensorOptions &options = {}, c10::optional memory_format = c10::nullopt)
然后就自己试：

#include <iostream>
#include "torch/script.h"
#include "torch/torch.h"
using namespace std;

int main() {   
    torch::Tensor tmp_1 = torch::rand({2,3});
    torch::Tensor tmp_2 = torch::full_like(tmp_1,1);
    
    cout<<tmp_1<<endl;
    cout<<tmp_2<<endl;
}

打印的结果如下：
0.8465 0.5771 0.4404
0.9805 0.8665 0.7807
[ Variable[CPUFloatType]{2,3} ]
1 1 1
1 1 1
[ Variable[CPUFloatType]{2,3} ]

1.创建与初始化tensor

1.1 torch::rand

1.2 torch::empty

1.3 torch::ones

1.4 torch::Tensor keep = torch::zeros({scores.size(0)}).to(torch::kLong).to(scores.device());

1.5 torch::Tensor num_out = torch::full({ 2,3 }, -2, torch::dtype(torch::kLong));torch::full创建tensor指定形

1.6 torch::Tensor a = torch::ones({3,2}).fill_(-8).to(torch::kCUDA);

1.7. torch::full_like（见0）创建一个和已知tensor形状一样的状并填充指定val的

1.1 torch::rand

torch::Tensor input = torch::rand({ 1,3,2,3 });

(1,1,.,.) =
0.5943 0.4822 0.6663
0.7099 0.0374 0.9833

(1,2,.,.) =
0.4384 0.4567 0.2143
0.3967 0.4999 0.9196

(1,3,.,.) =
0.2467 0.5066 0.8654
0.7873 0.4758 0.3718
[ Variable[CPUFloatType]{1,3,2,3} ]

1.2 torch::empty

   torch::Tensor a = torch::empty({2, 4});
    std::cout << a << std::endl;

7.0374e+22 5.7886e+22 6.7120e+22 6.7331e+22
6.7120e+22 1.8515e+28 7.3867e+20 9.2358e-01
[ Variable[CPUFloatType]{2,4} ]

1.3 torch::ones

    torch::Tensor a = torch::ones({2, 4});
    std::cout << a<< std::endl;

1 1 1 1
1 1 1 1
[ Variable[CPUFloatType]{2,4} ]

1.4 torch::zeros

 torch::Tensor scores；
 torch::Tensor keep = torch::zeros({scores.size(0)}).to(torch::kLong).to(scores.device());

1.5 torch::full
inline at::Tensor full(at::IntArrayRef size, at::Scalar fill_value, c10::optional names, const at::TensorOptions & options = {})
inline at::Tensor full(at::IntArrayRef size, at::Scalar fill_value, const at::TensorOptions & options = {})

    torch::Tensor num_out = torch::full({ 2,3 }, -2, torch::dtype(torch::kLong));
    std::cout<<num_out<<std::endl;

1.6 torch::Tensor a = torch::ones({3,2}).fill_(-8).to(torch::kCUDA);

    torch::Tensor a = torch::ones({3,2}).fill_(-8).to(torch::kCUDA);
    std::cout<<a<<std::endl;

-8 -8
-8 -8
-8 -8
[ Variable[CUDAFloatType]{3,2} ]

2.拼接tensor torch::cat 以及vector 和cat的融合操作

2.1 按列拼接

    torch::Tensor a = torch::rand({2,3});
    torch::Tensor b = torch::rand({2,1});
    torch::Tensor cat_1 = torch::cat({a,b},1);//按列拼接--》》前提是行数需要一样

    std::cout<<a<<std::endl;
    std::cout<<b<<std::endl;
    std::cout<<cat_1<<std::endl;

0.3551 0.7215 0.3603
0.1188 0.4577 0.2201
[ Variable[CPUFloatType]{2,3} ]
0.5876
0.3040
[ Variable[CPUFloatType]{2,1} ]
0.3551 0.7215 0.3603 0.5876
0.1188 0.4577 0.2201 0.3040
[ Variable[CPUFloatType]{2,4} ]
注意：如果行数不一样会报如下错误
terminate called after throwing an instance of 'std::runtime_error'
what(): invalid argument 0: Sizes of tensors must match except in dimension 1. Got 2 and 4 in dimension 0 at /data_2/everyday/0429/pytorch/aten/src/TH/generic/THTensor.cpp:689

2.2 按行拼接

    torch::Tensor a = torch::rand({2,3});
    torch::Tensor b = torch::rand({1,3});
    torch::Tensor cat_1 = torch::cat({a,b},0);

    std::cout<<a<<std::endl;
    std::cout<<b<<std::endl;
    std::cout<<cat_1<<std::endl;

0.0004 0.7852 0.4586
0.1612 0.6524 0.7655
[ Variable[CPUFloatType]{2,3} ]
0.5999 0.5445 0.2152
[ Variable[CPUFloatType]{1,3} ]
0.0004 0.7852 0.4586
0.1612 0.6524 0.7655
0.5999 0.5445 0.2152
[ Variable[CPUFloatType]{3,3} ]

2.3 其他例子

    torch::Tensor box_1 = torch::rand({5,4});
    torch::Tensor score_1 = torch::rand({5,1});
    torch::Tensor label_1 = torch::rand({5,1});
    torch::Tensor result_1 = torch::cat({box_1,score_1,label_1},1);
    result_1.print();

[Variable[CPUFloatType] [5, 6]]

2.4 vector 和cat的融合操作

    torch::Tensor xs_t0 = xs - wh_0 / 2;
    torch::Tensor ys_t0 = ys - wh_1 / 2;
    torch::Tensor xs_t1 = xs + wh_0 / 2;
    torch::Tensor ys_t1 = ys + wh_1 / 2;
    xs_t0.print();
    ys_t0.print();
    xs_t1.print();
    ys_t1.print();
    vector<torch::Tensor> abce = {xs_t0,ys_t0,xs_t1,ys_t1};
    torch::Tensor bboxes = torch::cat(abce,2);
    std::cout<<"-----cat   shape---"<<std::endl;
    bboxes.print();
    while(1);

打印如下：

[Variable[CUDAType] [1, 100, 1]]
[Variable[CUDAType] [1, 100, 1]]
[Variable[CUDAType] [1, 100, 1]]
[Variable[CUDAType] [1, 100, 1]]
[Variable[CUDAType] [1, 100, 4]]
-----cat   shape---

也可以一句话搞定：

 torch::Tensor bboxes = torch::cat({xs_t0,ys_t0,xs_t1,ys_t1},2);

3.torch的切片操作【select（浅拷贝）】【index_select 深拷贝）】【index 深拷贝】【slice 浅拷贝】 narrow，narrow_copy

select【浅拷贝】只能指定取某一行或某一列
index【深拷贝】只能指定取某一行
index_select【深拷贝】可以按行或按列，指定多行或多列
slice【浅拷贝】连续的行或列
narrow，narrow_copy

当是浅拷贝，又不想影响之前的结果的时候，可以加个clone（），比如：

 torch::Tensor x1 = boxes.select(1,0).clone();

3.1 inline Tensor Tensor::select(int64_t dim, int64_t index) ；好像只能整2维的。第一个参数是维度，0是取行，1是取列，第二个参数是索引的序号
3.1.1 select//按行取

    torch::Tensor a = torch::rand({2,3});
    std::cout<<a<<std::endl;
    torch::Tensor b = a.select(0,1);//按行取
    std::cout<<b<<std::endl;

0.6201 0.7021 0.1975
0.3080 0.6304 0.1558
[ Variable[CPUFloatType]{2,3} ]
0.3080
0.6304
0.1558
[ Variable[CPUFloatType]{3} ]
3.1.2 select//按列取

    torch::Tensor a = torch::rand({2,3});
    std::cout<<a<<std::endl;

    torch::Tensor b = a.select(1,1);
    std::cout<<b<<std::endl;

0.8295 0.9871 0.1287
0.8466 0.7719 0.2354
[ Variable[CPUFloatType]{2,3} ]
0.9871
0.7719
[ Variable[CPUFloatType]{2} ]
注意：这里是浅拷贝，就是改变b，同时a的值也会同样的改变
3.1.3 select浅拷贝

    torch::Tensor a = torch::rand({2,3});
    std::cout<<a<<std::endl;
    
    torch::Tensor b = a.select(1,1);
    std::cout<<b<<std::endl;
    
    b[0] = 0.0;
    std::cout<<a<<std::endl;
    std::cout<<b<<std::endl;

0.0938 0.2861 0.0089
0.3481 0.5806 0.3711
[ Variable[CPUFloatType]{2,3} ]
0.2861
0.5806
[ Variable[CPUFloatType]{2} ]
0.0938 0.0000 0.0089
0.3481 0.5806 0.3711
[ Variable[CPUFloatType]{2,3} ]
0.0000
0.5806
[ Variable[CPUFloatType]{2} ]
可以看到，b[0] = 0.0;然后a，b的对应位置都为0了。浅拷贝！！

3.2 inline Tensor Tensor::index_select(Dimname dim, const Tensor & index) //同样的，dim0表示按行，1表示按列 index表示取的行号或者列号，这里
比较奇怪，index一定要是toType(torch::kLong)这种类型的。还有一个奇怪的地方是我准备用数组导入tensor的，发现idx全是0，原因未知

 torch::Tensor a = torch::rand({2,6});
    std::cout<<a<<std::endl;
slice

     torch::Tensor idx = torch::empty({4}).toType(torch::kLong);
     idx[0]=0;
     idx[1]=2;
     idx[2]=4;
     idx[3]=1;

//    int idx_data[4] = {1,3,2,4};
//    torch::Tensor idx = torch::from_blob(idx_data,{4}).toType(torch::kLong);//idx全是0  ?????????????????

    std::cout<<idx<<std::endl;

    torch::Tensor b = a.index_select(1,idx);
    std::cout<<b<<std::endl;

0.4956 0.5028 0.0863 0.9464 0.6714 0.5348
0.3523 0.2245 0.0924 0.7088 0.6913 0.2237
[ Variable[CPUFloatType]{2,6} ]
0
2
4
1
[ Variable[CPULongType]{4} ]
0.4956 0.0863 0.6714 0.5028
0.3523 0.0924 0.6913 0.2245
[ Variable[CPUFloatType]{2,4} ]

3.2.2 index_select深拷贝

    torch::Tensor a = torch::rand({2,6});
    std::cout<<a<<std::endl;


     torch::Tensor idx = torch::empty({4}).toType(torch::kLong);
     idx[0]=0;
     idx[1]=2;
     idx[2]=4;
     idx[3]=1;

//    int idx_data[4] = {1,3,2,4};
//    torch::Tensor idx = torch::from_blob(idx_data,{4}).toType(torch::kLong);

    std::cout<<idx<<std::endl;

    torch::Tensor b = a.index_select(1,idx);
    std::cout<<b<<std::endl;

    b[0][0]=0.0;
    std::cout<<a<<std::endl;
    std::cout<<b<<std::endl;

0.6118 0.6078 0.5052 0.9489 0.6201 0.8975
0.0901 0.2040 0.1452 0.6452 0.9593 0.7454
[ Variable[CPUFloatType]{2,6} ]
0
2
4
1
[ Variable[CPULongType]{4} ]
0.6118 0.5052 0.6201 0.6078
0.0901 0.1452 0.9593 0.2040
[ Variable[CPUFloatType]{2,4} ]
0.6118 0.6078 0.5052 0.9489 0.6201 0.8975
0.0901 0.2040 0.1452 0.6452 0.9593 0.7454
[ Variable[CPUFloatType]{2,6} ]
0.0000 0.5052 0.6201 0.6078
0.0901 0.1452 0.9593 0.2040
[ Variable[CPUFloatType]{2,4} ]

3.3 index inline Tensor Tensor::index(TensorList indices)
这个函数实验下来，只能按行取，且是深拷贝

    torch::Tensor a = torch::rand({2,6});
    std::cout<<a<<std::endl;


    torch::Tensor idx_1 = torch::empty({2}).toType(torch::kLong);
    idx_1[0]=0;
    idx_1[1]=1;


    torch::Tensor bb = a.index(idx_1);
    bb[0][0]=0;

    std::cout<<bb<<std::endl;
    std::cout<<a<<std::endl;

0.1349 0.8087 0.2659 0.3364 0.0202 0.4498
0.4785 0.4274 0.9348 0.0437 0.6732 0.3174
[ Variable[CPUFloatType]{2,6} ]
0.0000 0.8087 0.2659 0.3364 0.0202 0.4498
0.4785 0.4274 0.9348 0.0437 0.6732 0.3174
[ Variable[CPUFloatType]{2,6} ]
0.1349 0.8087 0.2659 0.3364 0.0202 0.4498
0.4785 0.4274 0.9348 0.0437 0.6732 0.3174
[ Variable[CPUFloatType]{2,6} ]
3.4 slice inline Tensor Tensor::slice(int64_t dim, int64_t start, int64_t end, int64_t step) //dim0表示按行取，1表示按列取，从start开始，到end（不含）结束
可以看到结果，是浅拷贝！！！

 torch::Tensor a = torch::rand({2,6});
    std::cout<<a<<std::endl;

    torch::Tensor b = a.slice(0,0,1);
    torch::Tensor c = a.slice(1,0,3);

    b[0][0]=0.0;

    std::cout<<b<<std::endl;
    std::cout<<c<<std::endl;

    std::cout<<a<<std::endl;

0.8270 0.7952 0.3743 0.7992 0.9093 0.5945
0.3764 0.8419 0.7977 0.4150 0.8531 0.9207
[ Variable[CPUFloatType]{2,6} ]
0.0000 0.7952 0.3743 0.7992 0.9093 0.5945
[ Variable[CPUFloatType]{1,6} ]
0.0000 0.7952 0.3743
0.3764 0.8419 0.7977
[ Variable[CPUFloatType]{2,3} ]
0.0000 0.7952 0.3743 0.7992 0.9093 0.5945
0.3764 0.8419 0.7977 0.4150 0.8531 0.9207
[ Variable[CPUFloatType]{2,6} ]

3.5 narrow narrow_copy
inline Tensor Tensor::narrow(int64_t dim, int64_t start, int64_t length) const
inline Tensor Tensor::narrow_copy(int64_t dim, int64_t start, int64_t length) const

    torch::Tensor a = torch::rand({4,6});
    torch::Tensor b = a.narrow(0,1,2);
    torch::Tensor c = a.narrow_copy(0,1,2);

    std::cout<<a<<std::endl;
    std::cout<<b<<std::endl;
    std::cout<<c<<std::endl;

0.9812 0.4205 0.4169 0.2412 0.8769 0.9873
0.8052 0.0312 0.9901 0.5065 0.6344 0.3408
0.0182 0.6933 0.9375 0.8675 0.5201 0.9521
0.5119 0.3880 0.1117 0.5413 0.8203 0.4163
[ Variable[CPUFloatType]{4,6} ]
0.8052 0.0312 0.9901 0.5065 0.6344 0.3408
0.0182 0.6933 0.9375 0.8675 0.5201 0.9521
[ Variable[CPUFloatType]{2,6} ]
0.8052 0.0312 0.9901 0.5065 0.6344 0.3408
0.0182 0.6933 0.9375 0.8675 0.5201 0.9521
[ Variable[CPUFloatType]{2,6} ]

4.squeeze() unsqueeze()

inline Tensor Tensor::squeeze() const//不加参数的，把所有为1的维度都压缩
inline Tensor Tensor::squeeze(int64_t dim)const//加参数的，指定哪个维度压缩
inline Tensor & Tensor::squeeze_() const //暂时不知道啥区别
inline Tensor & Tensor::squeeze_(int64_t dim) const //暂时不知道啥区别
4.1 squeeze()

(1,.,.) = 
  0.5516  0.6561  0.3603
  0.7555  0.1048  0.2016
[ Variable[CPUFloatType]{1,2,3} ]
 0.5516  0.6561  0.3603
 0.7555  0.1048  0.2016
[ Variable[CPUFloatType]{2,3} ]
(1,.,.) = 
  0.7675  0.5439  0.5162

(2,.,.) = 
  0.6103  0.1925  0.1222
[ Variable[CPUFloatType]{2,1,3} ]
 0.7675  0.5439  0.5162
 0.6103  0.1925  0.1222
[ Variable[CPUFloatType]{2,3} ]
(1,1,.,.) = 
  0.9875
  0.1980

(2,1,.,.) = 
  0.6973
  0.3272
[ Variable[CPUFloatType]{2,1,2,1} ]
 0.9875  0.1980
 0.6973  0.3272
[ Variable[CPUFloatType]{2,2} ]

4.2 squeeze(int64_t dim) 指定压缩哪个维度

    torch::Tensor a = torch::rand({1,1,3});
    std::cout<<a<<std::endl;
    
    torch::Tensor b = a.squeeze();
    std::cout<<b<<std::endl;
    
    torch::Tensor c = a.squeeze(0);
    std::cout<<c<<std::endl;
    
    torch::Tensor d = a.squeeze(1);
    std::cout<<d<<std::endl;
    
    torch::Tensor e = a.squeeze(2);
    std::cout<<e<<std::endl;

(1,.,.) =
0.8065 0.1287 0.8073
[ Variable[CPUFloatType]{1,1,3} ]
0.8065
0.1287
0.8073
[ Variable[CPUFloatType]{3} ]
0.8065 0.1287 0.8073
[ Variable[CPUFloatType]{1,3} ]
0.8065 0.1287 0.8073
[ Variable[CPUFloatType]{1,3} ]
(1,.,.) =
0.8065 0.1287 0.8073
[ Variable[CPUFloatType]{1,1,3} ]
4.3. unsqueeze

    torch::Tensor a = torch::rand({2,3});
    std::cout<<a<<std::endl;

    torch::Tensor b = a.unsqueeze(0);
    std::cout<<b<<std::endl;

    torch::Tensor bb = a.unsqueeze(1);
    std::cout<<bb<<std::endl;

    torch::Tensor bbb = a.unsqueeze(2);
    std::cout<<bbb<<std::endl;

0.7945 0.0331 0.1666
0.7821 0.3359 0.0663
[ Variable[CPUFloatType]{2,3} ]
(1,.,.) =
0.7945 0.0331 0.1666
0.7821 0.3359 0.0663
[ Variable[CPUFloatType]{1,2,3} ]
(1,.,.) =
0.7945 0.0331 0.1666

(2,.,.) =
0.7821 0.3359 0.0663
[ Variable[CPUFloatType]{2,1,3} ]
(1,.,.) =
0.7945
0.0331
0.1666

(2,.,.) =
0.7821
0.3359
0.0663
[ Variable[CPUFloatType]{2,3,1} ]

5.torch::nonzero 输出非0的坐标

    torch::Tensor a = torch::rand({2,3});
    a[0][1] = 0;
    a[1][2] = 0;
    std::cout<<a<<std::endl;
     torch::Tensor b = torch::nonzero(a);
     std::cout<<b<<std::endl;

0.4671 0.0000 0.3360
0.9320 0.9246 0.0000
[ Variable[CPUFloatType]{2,3} ]
0 0
0 2
1 0
1 1
[ Variable[CPULongType]{4,2} ]

6.访问tensor值 a.item()就把1*1 的 tensor的a转为float

取出tensor的某个值为int或者float ===》》》auto bbb = a[1][1].item().toFloat();
一般情况下取出tensor某个值可以直接下标索引即可。比如a[0][1],但是这个值还是tensor类型的，要想为c++的int或者float的，如下：

    torch::Tensor a = torch::rand({2,3});
    std::cout<<a<<std::endl;
    auto bbb = a[1][1].item().toFloat();
    std::cout<<bbb<<std::endl;

0.7303 0.6608 0.0024
0.5917 0.0145 0.6472
[ Variable[CPUFloatType]{2,3} ]
0.014509
[ Variable[CPUFloatType]{} ]
0.014509

另外的例子：

    torch::Tensor scores = torch::rand({10});
    std::tuple<torch::Tensor,torch::Tensor> sort_ret = torch::sort(scores.unsqueeze(1), 0, 1);
    torch::Tensor v = std::get<0>(sort_ret).squeeze(1).to(scores.device());
    torch::Tensor idx = std::get<1>(sort_ret).squeeze(1).to(scores.device());
    std::cout<<scores<<std::endl;
    std::cout<<v<<std::endl;
    std::cout<<idx<<std::endl;

    for(int i=0;i<10;i++)
    {
         int idx_1 = idx[i].item<int>();
         float s = v[i].item<float>();

          std::cout<<idx_1<<"  "<<s<<std::endl;
    }

0.1125
0.9524
0.7033
0.3204
0.7907
0.8486
0.7783
0.3215
0.0378
0.7512
[ Variable[CPUFloatType]{10} ]
0.9524
0.8486
0.7907
0.7783
0.7512
0.7033
0.3215
0.3204
0.1125
0.0378
[ Variable[CPUFloatType]{10} ]
1
5
4
6
9
2
7
3
0
8
[ Variable[CPULongType]{10} ]
1 0.952351
5 0.848641
4 0.790685
6 0.778329
9 0.751163
2 0.703278
7 0.32146
3 0.320435
0 0.112517
8 0.0378203

7.opencv Mat类型转tensor 或者其他的vector或者数组数据转tensor

7.1

   Mat m_out = imread(path);
 //[320,320,3]
    input_tensor = torch::from_blob(
                m_out.data, {m_SIZE_IMAGE, m_SIZE_IMAGE, 3}).toType(torch::kFloat32);//torch::kByte //大坑
    //[3,320,320]
    input_tensor = input_tensor.permute({2,0,1});
    input_tensor = input_tensor.unsqueeze(0);
    input_tensor = input_tensor.to(torch::kFloat).to(m_device);

这里需要注意，因为上面图片被我预处理减均值过的，导致m_out像素值有负数，如果torch::kByte这种格式，会把负数变成正数，所以需要torch::kFloat32类型的。
permute({2,0,1});
之前是opencv Mat是
0 1 2
[320,320,3]
经过了permute({2,0,1})，表示把对应位置换一下，就变成了[3,320,320]

7.2

std::vector<float> region_priors;
//region_priors.push_back(num)  region_priors的size是6375 × 4
torch::Tensor m_prior = torch::from_blob(region_priors.data(),{6375,4}).cuda();

8.tensor 的size sizes() numel()

    torch::Tensor a = torch::rand({2,3});
    std::cout<<a<<std::endl;

    auto aa = a.size(0);
    auto bb = a.size(1);
    auto a_size = a.sizes();
    std::cout<<aa<<std::endl;
    std::cout<<bb<<std::endl;
    std::cout<<a_size<<std::endl;

    int num_ = a.numel();
    std::cout<<num_<<std::endl;

0.6522 0.0480 0.0009
0.1185 0.4639 0.0386
[ Variable[CPUFloatType]{2,3} ]
2
3
[2, 3]
6

8.2
有个问题就是当torch::Tensor a;直接定义一个tensor的时候，再访问

    torch::Tensor a;
     auto a_size = a.sizes();

就会报错
terminate called after throwing an instance of 'c10::Error'
what(): sizes() called on undefined Tensor (sizes at /data_2/everyday/0429/pytorch/c10/core/UndefinedTensorImpl.cpp:12)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6a (0x7f83b563f0aa in /data_2/everyday/0429/pytorch/torch/lib/libc10.so)
frame #1: c10::UndefinedTensorImpl::sizes() const + 0x258 (0x7f83b56362b8 in /data_2/everyday/0429/pytorch/torch/lib/libc10.so)
frame #2: at::Tensor::sizes() const + 0x27 (0x405fc9 in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)
frame #3: main + 0x30 (0x405d06 in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)
frame #4: __libc_start_main + 0xf0 (0x7f83b4d12830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #5: _start + 0x29 (0x405c09 in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)

程序异常结束。
用numel()就没有问题

    torch::Tensor a;
    int num_ = a.numel();
    std::cout<<num_<<std::endl;

8.3 获取维度大小，比如[1,5,8,2],我需要得到维度4

auto aaa = img_poly.sizes();
int len_ = aaa.size();

9.torch::sort

static inline std::tuple<Tensor,Tensor> sort(const Tensor & self, Dimname dim, bool descending)
dim0表示按行，1表示按列
descending=false表示升序，true表示降序
返回的是元组，第一个表示排序后的值，第二个表示排序之后对应之前的索引。

    torch::Tensor scores = torch::rand({10});
    std::tuple<torch::Tensor,torch::Tensor> sort_ret = torch::sort(scores.unsqueeze(1), 0, 1);
    torch::Tensor v = std::get<0>(sort_ret).squeeze(1).to(scores.device());
    torch::Tensor idx = std::get<1>(sort_ret).squeeze(1).to(scores.device());
    std::cout<<scores<<std::endl;
    std::cout<<v<<std::endl;
    std::cout<<idx<<std::endl;

0.8355
0.1386
0.7910
0.0988
0.2607
0.7810
0.7855
0.5529
0.5846
0.1403
[ Variable[CPUFloatType]{10} ]
0.8355
0.7910
0.7855
0.7810
0.5846
0.5529
0.2607
0.1403
0.1386
0.0988
[ Variable[CPUFloatType]{10} ]
0
2
6
5
8
7
4
9
1
3
[ Variable[CPULongType]{10} ]

10.clamp 把数值控制在min max之间，小于min的就为min，大于max的就为max

inline Tensor Tensor::clamp(c10::optional min, c10::optional max) const

    torch::Tensor a = torch::rand({2,3});
    a[0][0] = 20;
    a[0][1] = 21;
    a[0][2] = 22;
    a[1][0] = 23;
    a[1][1] = 24;
    std::cout<<a<<std::endl;

    torch::Tensor b = a.clamp(21,22);
    std::cout<<b<<std::endl;

20.0000 21.0000 22.0000
23.0000 24.0000 0.4792
[ Variable[CPUFloatType]{2,3} ]
21 21 22
22 22 21
[ Variable[CPUFloatType]{2,3} ]
在工程中，一般要取tensor里面的值，还有有时候就只限制一边，比如只限制min，如下：

 xx1 = xx1.clamp(x1[i].item().toFloat(),INT_MAX*1.0);

11.大于> 小于< 运算

    torch::Tensor a = torch::rand({2,3});
    std::cout<<a<<std::endl;
    torch::Tensor b = a > 0.5;
    std::cout<<b<<std::endl;

0.3526 0.0321 0.7098
0.9794 0.6531 0.9410
[ Variable[CPUFloatType]{2,3} ]
0 0 1
1 1 1
[ Variable[CPUBoolType]{2,3} ]

12.转置Tensor::transpose

inline Tensor Tensor::transpose(Dimname dim0, Dimname dim1) const

    torch::Tensor a = torch::rand({2,3});
    std::cout<<a<<std::endl;

    torch::Tensor b = a.transpose(1,0);
    std::cout<<b<<std::endl;

0.4039 0.3568 0.9978
0.6895 0.7258 0.5576
[ Variable[CPUFloatType]{2,3} ]
0.4039 0.6895
0.3568 0.7258
0.9978 0.5576
[ Variable[CPUFloatType]{3,2} ]

13.expand_as

inline Tensor Tensor::expand_as(const Tensor & other) const

    torch::Tensor a = torch::rand({2,3});;
    //    torch::Tensor b = torch::ones({2,2});
    torch::Tensor b = torch::ones({2,1});
    torch::Tensor c = b.expand_as(a);
    
    std::cout<<a<<std::endl;
    std::cout<<b<<std::endl;
    std::cout<<c<<std::endl;

0.6063 0.4150 0.7665
0.8663 0.9563 0.7461
[ Variable[CPUFloatType]{2,3} ]
1
1
[ Variable[CPUFloatType]{2,1} ]
1 1 1
1 1 1
[ Variable[CPUFloatType]{2,3} ]

注意维度有一定要求，我这么写torch::Tensor b = torch::ones({2,2});torch::Tensor b = torch::ones({2});都会报错：
terminate called after throwing an instance of 'c10::Error'
what(): The expanded size of the tensor (3) must match the existing size (2) at non-singleton dimension 1. Target sizes: [2, 3]. Tensor sizes: [2, 2] (inferExpandGeometry at /data_2/everyday/0429/pytorch/aten/src/ATen/ExpandUtils.cpp:76)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6a (0x7f6a488150aa in /data_2/everyday/0429/pytorch/torch/lib/libc10.so)
frame #1: at::inferExpandGeometry(c10::ArrayRef, c10::ArrayRef, c10::ArrayRef) + 0x76b (0x7f6a49df7a4b in /data_2/everyday/0429/pytorch/torch/lib/libtorch.so)
frame #2: at::native::expand(at::Tensor const&, c10::ArrayRef, bool) + 0x84 (0x7f6a4a1e4324 in /data_2/everyday/0429/pytorch/torch/lib/libtorch.so)
frame #3: + 0x1aeb9e1 (0x7f6a4a5189e1 in /data_2/everyday/0429/pytorch/torch/lib/libtorch.so)
frame #4: + 0x19e8a2e (0x7f6a4a415a2e in /data_2/everyday/0429/pytorch/torch/lib/libtorch.so)
frame #5: + 0x3509dee (0x7f6a4bf36dee in /data_2/everyday/0429/pytorch/torch/lib/libtorch.so)
frame #6: + 0x19e8a2e (0x7f6a4a415a2e in /data_2/everyday/0429/pytorch/torch/lib/libtorch.so)
frame #7: + 0x14e8a61 (0x7f6a49f15a61 in /data_2/everyday/0429/pytorch/torch/lib/libtorch.so)
frame #8: at::native::expand_as(at::Tensor const&, at::Tensor const&) + 0x39 (0x7f6a4a1e4d49 in /data_2/everyday/0429/pytorch/torch/lib/libtorch.so)
frame #9: + 0x1aece9f (0x7f6a4a519e9f in /data_2/everyday/0429/pytorch/torch/lib/libtorch.so)
frame #10: + 0x3680543 (0x7f6a4c0ad543 in /data_2/everyday/0429/pytorch/torch/lib/libtorch.so)
frame #11: + 0x19e6bb4 (0x7f6a4a413bb4 in /data_2/everyday/0429/pytorch/torch/lib/libtorch.so)
frame #12: at::Tensor c10::KernelFunction::callUnboxed<at::Tensor, at::Tensor const&, at::Tensor const&>(at::Tensor const&, at::Tensor const&) const + 0xb0 (0x433e06 in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)
frame #13: at::Tensor c10::impl::OperatorEntry::callUnboxed<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TensorTypeId, at::Tensor const&, at::Tensor const&) const::{lambda(c10::DispatchTable const&)#1}::operator()(c10::DispatchTable const&) const + 0x79 (0x432525 in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)
frame #14: std::result_of<at::Tensor c10::impl::OperatorEntry::callUnboxed<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TensorTypeId, at::Tensor const&, at::Tensor const&) const::{lambda(c10::DispatchTable const&)#1} (c10::DispatchTable const&)>::type c10::LeftRightc10::DispatchTable::read<at::Tensor c10::impl::OperatorEntry::callUnboxed<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TensorTypeId, at::Tensor const&, at::Tensor const&) const::{lambda(c10::DispatchTable const&)#1}>(at::Tensor c10::impl::OperatorEntry::callUnboxed<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TensorTypeId, at::Tensor const&, at::Tensor const&) const::{lambda(c10::DispatchTable const&)#1}&&) const + 0x11c (0x4340ba in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)
frame #15: at::Tensor c10::impl::OperatorEntry::callUnboxed<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TensorTypeId, at::Tensor const&, at::Tensor const&) const + 0x5f (0x4325a5 in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)
frame #16: at::Tensor c10::Dispatcher::callUnboxed<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::OperatorHandle const&, c10::TensorTypeId, at::Tensor const&, at::Tensor const&) const + 0x85 (0x42fd5d in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)
frame #17: at::Tensor::expand_as(at::Tensor const&) const + 0x1a5 (0x42ba47 in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)
frame #18: main + 0xbd (0x427c97 in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)
frame #19: __libc_start_main + 0xf0 (0x7f6a47ee8830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #20: _start + 0x29 (0x426999 in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)

14.乘 mul_ 除div 减sub_

        boxes_my.select(1,0).mul_(width);
        boxes_my.select(1,1).mul_(height);
        boxes_my.select(1,2).mul_(width);
        boxes_my.select(1,3).mul_(height);

prediction.select(2, 3).div(2);

      input_tensor[0][0] = input_tensor[0][0].sub_(0.485).div_(0.229);
               input_tensor[0][1] = input_tensor[0][1].sub_(0.456).div_(0.224);
               input_tensor[0][2] = input_tensor[0][2].sub_(0.406).div_(0.225);

15.加载模型

    torch::Device m_device(torch::kCUDA);
    torch::jit::script::Module m_model = torch::jit::load(path_pt);
    m_model.to(m_device);
    m_model.eval();

16.模型forward出来的结果

当模型有几个东东输出来的时候

 auto output = m_model.forward({input_tensor});

    auto tpl = output.toTuple();
    auto arm_loc = tpl->elements()[0].toTensor();
    // arm_loc.print();
    //    std::cout<<arm_loc[0]<<std::endl;
    auto arm_conf = tpl->elements()[1].toTensor();
    //arm_conf.print();
    auto odm_loc = tpl->elements()[2].toTensor();
    //odm_loc.print();
    //     std::cout<<odm_loc[0]<<std::endl;
    auto odm_conf = tpl->elements()[3].toTensor();
    //    odm_conf.print();

17.resize_ zero_

Tensor & resize_(IntArrayRef size) const;
Tensor & zero_() const;

    torch::Tensor a = torch::rand({1,3,2,2});

    const int batch_size = a.size(0);
    const int depth = a.size(1);
    const int image_height = a.size(2);
    const int image_width = a.size(3);

    torch::Tensor crops = torch::rand({1,3,2,2});
    //    torch::Tensor crops;
    crops.resize_({ batch_size, depth, image_height, image_width });
    crops.zero_();

    std::cout<<a<<std::endl;
    std::cout<<crops<<std::endl;

(1,1,.,.) =
0.7889 0.3291
0.2541 0.8283

(1,2,.,.) =
0.0209 0.1846
0.2528 0.2755

(1,3,.,.) =
0.0294 0.6623
0.2736 0.3376
[ Variable[CPUFloatType]{1,3,2,2} ]
(1,1,.,.) =
0 0
0 0

(1,2,.,.) =
0 0
0 0

(1,3,.,.) =
0 0
0 0
[ Variable[CPUFloatType]{1,3,2,2} ]
注意：这里如果只定义 torch::Tensor crops;//torch::Tensor crops = torch::rand({1,3,2,2})；就会报错，感觉还是要先初始化一下才会分配内存，要不然就会报错！
terminate called after throwing an instance of '
c10::Error'
what(): There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::resize_. This usually means that this function requires a non-empty list of Tensors. Available functions are [CUDATensorId, QuantizedCPUTensorId, CPUTensorId, VariableTensorId] (lookup_ at /data_2/everyday/0429/pytorch/torch/include/ATen/core/dispatch/DispatchTable.h:243)
frame #0: c10::Error::Error(c10::SourceLocation, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6a (0x7fa2f5f450aa in /data_2/everyday/0429/pytorch/torch/lib/libc10.so)
frame #1: c10::KernelFunction const& c10::DispatchTable::lookup<c10::DispatchTable::lookup(c10::TensorTypeId) const::{lambda()#1}>(c10::DispatchTable::lookup(c10::TensorTypeId) const::{lambda()#1} const&) const + 0x1da (0x42eaa8 in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)
frame #2: c10::DispatchTable::lookup(c10::TensorTypeId) const + 0x3a (0x42acf4 in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)
frame #3: at::Tensor& c10::impl::OperatorEntry::callUnboxedOnly<at::Tensor&, at::Tensor&, c10::ArrayRef >(c10::TensorTypeId, at::Tensor&, c10::ArrayRef) const::{lambda(c10::DispatchTable const&)#1}::operator()(c10::DispatchTable const&) const + 0x51 (0x431543 in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)
frame #4: std::result_of<at::Tensor& c10::impl::OperatorEntry::callUnboxedOnly<at::Tensor&, at::Tensor&, c10::ArrayRef >(c10::TensorTypeId, at::Tensor&, c10::ArrayRef) const::{lambda(c10::DispatchTable const&)#1} (c10::DispatchTable const&)>::type c10::LeftRightc10::DispatchTable::read<at::Tensor& c10::impl::OperatorEntry::callUnboxedOnly<at::Tensor&, at::Tensor&, c10::ArrayRef >(c10::TensorTypeId, at::Tensor&, c10::ArrayRef) const::{lambda(c10::DispatchTable const&)#1}>(at::Tensor& c10::impl::OperatorEntry::callUnboxedOnly<at::Tensor&, at::Tensor&, c10::ArrayRef >(c10::TensorTypeId, at::Tensor&, c10::ArrayRef) const::{lambda(c10::DispatchTable const&)#1}&&) const + 0x114 (0x4333c6 in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)
frame #5: at::Tensor& c10::impl::OperatorEntry::callUnboxedOnly<at::Tensor&, at::Tensor&, c10::ArrayRef >(c10::TensorTypeId, at::Tensor&, c10::ArrayRef) const + 0x63 (0x4315c7 in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)
frame #6: at::Tensor& c10::Dispatcher::callUnboxedOnly<at::Tensor&, at::Tensor&, c10::ArrayRef >(c10::OperatorHandle const&, c10::TensorTypeId, at::Tensor&, c10::ArrayRef) const + 0x7b (0x42eff5 in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)
frame #7: at::Tensor::resize(c10::ArrayRef) const + 0x1a1 (0x42af3f in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)
frame #8: main + 0x134 (0x42798f in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)
frame #9: __libc_start_main + 0xf0 (0x7fa2f5618830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #10: _start + 0x29 (0x426719 in /data_2/everyday/0516/build-libtorch- syntax-unknown-Default/main)

18.meshgrid 把tens变成方阵

static inline std::vector meshgrid(TensorList tensors)

    torch::Tensor scales = torch::ones({2});
    torch::Tensor ratios = torch::ones({2});
    ratios  += 2;

    std::cout<<scales<<std::endl;
    std::cout<<ratios<<std::endl;

    std::vector<torch::Tensor> mesh = torch::meshgrid({ scales, ratios });

    torch::Tensor scales_1 = mesh[0];
    torch::Tensor ratios_1 = mesh[1];

    std::cout<<scales_1<<std::endl;
    std::cout<<ratios_1<<std::endl;

1
1
[ Variable[CPUFloatType]{2} ]
3
3
[ Variable[CPUFloatType]{2} ]
1 1
1 1
[ Variable[CPUFloatType]{2,2} ]
3 3
3 3
[ Variable[CPUFloatType]{2,2} ]

19.flatten 展平tensor

Tensor flatten(int64_t start_dim=0, int64_t end_dim=-1) const;
Tensor flatten(int64_t start_dim, int64_t end_dim, Dimname out_dim) const;
Tensor flatten(Dimname start_dim, Dimname end_dim, Dimname out_dim) const;
Tensor flatten(DimnameList dims, Dimname out_dim) const;

   torch::Tensor a = torch::rand({2,3});
    torch::Tensor b = a.flatten();
    std::cout<<a<<std::endl;
    std::cout<<b<<std::endl;

0.9953 0.1461 0.0084
0.6169 0.4037 0.7685
[ Variable[CPUFloatType]{2,3} ]
0.9953
0.1461
0.0084
0.6169
0.4037
0.7685

20.fill_ tensor填充某个值就地操作，填充当前tensor

Tensor & fill_(Scalar value) const;
Tensor & fill_(const Tensor & value) const;

    torch::Tensor a = torch::rand({2,3});
    torch::Tensor b = a.fill_(4);

    std::cout<<a<<std::endl;
    std::cout<<b<<std::endl;

4 4 4
4 4 4
[ Variable[CPUFloatType]{2,3} ]
4 4 4
4 4 4
[ Variable[CPUFloatType]{2,3} ]

21.torch::stack

static inline Tensor stack(TensorList tensors, int64_t dim)

    torch::Tensor a = torch::rand({3});
    torch::Tensor b = torch::rand({3});
    torch::Tensor c = torch::stack({a,b},1);

    std::cout<<a<<std::endl;
    std::cout<<b<<std::endl;
    std::cout<<c<<std::endl;

0.6776
0.5610
0.2835
[ Variable[CPUFloatType]{3} ]
0.6846
0.3753
0.3873
[ Variable[CPUFloatType]{3} ]
0.6776 0.6846
0.5610 0.3753
0.2835 0.3873
[ Variable[CPUFloatType]{3,2} ]

    torch::Tensor a = torch::rand({3});
    torch::Tensor b = torch::rand({3});
    torch::Tensor c = torch::stack({a,b},0);

    std::cout<<a<<std::endl;
    std::cout<<b<<std::endl;
    std::cout<<c<<std::endl;

0.7129
0.1650
0.6764
[ Variable[CPUFloatType]{3} ]
0.8035
0.1807
0.8100
[ Variable[CPUFloatType]{3} ]
0.7129 0.1650 0.6764
0.8035 0.1807 0.8100
[ Variable[CPUFloatType]{2,3} ]

22.reshape

inline Tensor Tensor::reshape(IntArrayRef shape) const

    torch::Tensor a = torch::rand({2,4});
    torch::Tensor b = a.reshape({-1,2});
    std::cout<<a<<std::endl;
    std::cout<<b<<std::endl;

0.3782 0.6390 0.6919 0.8298
0.3872 0.5923 0.4337 0.9634
[ Variable[CPUFloatType]{2,4} ]
0.3782 0.6390
0.6919 0.8298
0.3872 0.5923
0.4337 0.9634
[ Variable[CPUFloatType]{4,2} ]

23. view

inline Tensor Tensor::view(IntArrayRef size) const

需要先contiguous
a.contiguous().view({-1, 4});

 torch::Tensor a = torch::rand({2,3});
    torch::Tensor b = a.contiguous().view({ -1, 6 });
    torch::Tensor c = a.contiguous().view({ 3, 2 });

    std::cout<<a<<std::endl;
    std::cout<<b<<std::endl;
    std::cout<<c<<std::endl;

0.2069 0.8814 0.8506
0.6451 0.0107 0.7591
[ Variable[CPUFloatType]{2,3} ]
0.2069 0.8814 0.8506 0.6451 0.0107 0.7591
[ Variable[CPUFloatType]{1,6} ]
0.2069 0.8814
0.8506 0.6451
0.0107 0.7591
[ Variable[CPUFloatType]{3,2} ]
注意这里和转置不一样

24.argmax argmin

static inline Tensor argmax(const Tensor & self, c10::optional<int64_t> dim=c10::nullopt, bool keepdim=false);
static inline Tensor argmin(const Tensor & self, c10::optional<int64_t> dim=c10::nullopt, bool keepdim=false);

    torch::Tensor a = torch::rand({2,3});
    auto b = torch::argmax(a, 0);

    std::cout<<a<<std::endl;
    std::cout<<b<<std::endl;

0.9337 0.7443 0.1323
0.6514 0.5068 0.5052
[ Variable[CPUFloatType]{2,3} ]
0
0
1
[ Variable[CPULongType]{3} ]

    torch::Tensor a = torch::rand({2,3});
    auto b = torch::argmax(a, 1);

    std::cout<<a<<std::endl;
    std::cout<<b<<std::endl;

0.0062 0.3846 0.4844
0.9555 0.2844 0.4025
[ Variable[CPUFloatType]{2,3} ]
2
0
[ Variable[CPULongType]{2} ]

25.where

static inline Tensor where(const Tensor & condition, const Tensor & self, const Tensor & other);
static inline std::vector where(const Tensor & condition);

torch::Tensor d = torch::where(a>0.5,b,c);
说明：在a大于0.5的位置设为pos，d的pos位置上用b的pos位置上面值填充，其余的位置上值是c的值

     
    torch::Tensor a = torch::rand({2,3});
    torch::Tensor b = torch::ones({2,3});
    torch::Tensor c = torch::zeros({2,3});

    torch::Tensor d = torch::where(a>0.5,b,c);
    std::cout<<a<<std::endl;
    std::cout<<b<<std::endl;
    std::cout<<c<<std::endl;
    std::cout<<d<<std::endl;

0.7301 0.8926 0.9570
0.0979 0.5679 0.4473
[ Variable[CPUFloatType]{2,3} ]
1 1 1
1 1 1
[ Variable[CPUFloatType]{2,3} ]
0 0 0
0 0 0
[ Variable[CPUFloatType]{2,3} ]
1 1 1
0 1 0
[ Variable[CPUFloatType]{2,3} ]

另外的例子：
auto b = torch::where(a>0.5);


    torch::Tensor a = torch::rand({2,3});
    auto b = torch::where(a>0.5);

    std::cout<<a<<std::endl;
    std::cout<<b<<std::endl;

0.3439 0.1622 0.7149
0.4845 0.5982 0.9443
[ Variable[CPUFloatType]{2,3} ]
0
1
1
[ Variable[CPULongType]{3} ]
2
1
2
[ Variable[CPULongType]{3} ]

26.accessor

TensorAccessor<T,N> accessor() const&
auto result_data = result.accessor<float, 2>(); //2代表二维
示例1：

torch::Tensor one = torch::randn({9,6});
auto foo_one=one.accessor<float,2>();
for(int i=0,sum=0;i<foo_one.size(0);i++)
 for(int j=0;j<foo_one.size(1);j++)
     sum+=foo_one[i][j];

示例2：

 torch::Tensor result;
    for(int i=1;i<m_num_class;i++) 
    {
        //...
        if(0 == result.numel())
        {
            result = result_.clone();
        }else
        {
            result = torch::cat({result,result_},0);//按行拼接
        }
    }
    result =result.cpu();
    auto result_data = result.accessor<float, 2>();
    
    cv::Mat img_draw = img.clone();
    for(int i=0;i<result_data.size(0);i++)
    {
        float score = result_data[i][4];
        if(score < 0.4) { continue;}
        int x1 = result_data[i][0];
        int y1 = result_data[i][1];
        int x2 = result_data[i][2];
        int y2 = result_data[i][3];
        int id_label = result_data[i][5];
        
        cv::rectangle(img_draw,cv::Point(x1,y1),cv::Point(x2,y2),cv::Scalar(255,0,0),3);
        cv::putText(img_draw,label_map[id_label],cv::Point(x1,y2),CV_FONT_HERSHEY_SIMPLEX,1,cv::Scalar(255,0,55));
    }

27. torch::max torch::min 同max

static inline std::tuple<Tensor,Tensor> max(const Tensor & self, Dimname dim, bool keepdim=false);
static inline Tensor max(const Tensor & self);

    torch::Tensor a = torch::rand({4,2});
    std::tuple<torch::Tensor, torch::Tensor> max_test = torch::max(a,1);

    auto max_val = std::get<0>(max_test);
    // index
    auto index = std::get<1>(max_test);

    std::cout<<a<<std::endl;
    std::cout<<max_val<<std::endl;
     std::cout<<index<<std::endl;

0.1082 0.7954
0.3099 0.4507
0.2447 0.5169
0.8210 0.3141
[ Variable[CPUFloatType]{4,2} ]
0.7954
0.4507
0.5169
0.8210
[ Variable[CPUFloatType]{4} ]
1
1
1
0
[ Variable[CPULongType]{4} ]

另外一个例子：全局max

    torch::Tensor a = torch::rand({4,2});
    torch::Tensor max_test = torch::max(a);

    std::cout<<a<<std::endl;
    std::cout<<max_test<<std::endl;

0.1904 0.9493
0.6521 0.5788
0.9216 0.5997
0.1758 0.7384
[ Variable[CPUFloatType]{4,2} ]
0.94929
[ Variable[CPUFloatType]{} ]

28.masked_select 与 masked_fill

28.1 Tensor masked_select(const Tensor & mask) const;

       torch::Tensor a = torch::rand({2,3});
    torch::Tensor c = (a>0.25);
    torch::Tensor d = a.masked_select(c);

    std::cout<<a<<std::endl;
    std::cout<<c<<std::endl;
    std::cout<<d<<std::endl;

0.0667 0.3812 0.3810
0.3558 0.8628 0.6329
[ Variable[CPUFloatType]{2,3} ]
0 1 1
1 1 1
[ Variable[CPUBoolType]{2,3} ]
0.3812
0.3810
0.3558
0.8628
0.6329
[ Variable[CPUFloatType]{5} ]

28.2 Tensor masked_fill(const Tensor & mask, Scalar value) const;

Tensor & masked_fill_(const Tensor & mask, const Tensor & value) const;
Tensor masked_fill(const Tensor & mask, const Tensor & value) const;

    torch::Tensor a = torch::rand({2,3});
    torch::Tensor aa = a.clone();
    aa.masked_fill_(aa>0.5,-2);

    std::cout<<a<<std::endl;
    std::cout<<aa<<std::endl;

0.8803 0.2387 0.8577
0.8166 0.0730 0.4682
[ Variable[CPUFloatType]{2,3} ]
-2.0000 0.2387 -2.0000
-2.0000 0.0730 0.4682
[ Variable[CPUFloatType]{2,3} ]

28.3 masked_fill_ 带下划线的都是就地操作

有个需求是Tensor score表示得分，Tensor label表示标签，他们都是同大小的。后处理就是当label=26并且label=26的分数小于0.5，那么就把label相应位置置1

 float index[] = {3,2,3,3,5,6,7,8,9,10,11,12,13,14,15,16};
    float score[] = {0.1,0.1,0.9,0.9,0.9,0.1,0.1,0.1,0.1,0.1,0.8,0.8,0.8,0.8,0.8,0.8};

    torch::Tensor aa = torch::from_blob(index, {4,4}).toType(torch::kFloat32);
    torch::Tensor bb = torch::from_blob(score, {4,4}).toType(torch::kFloat32);
    std::cout<<aa<<std::endl;
    std::cout<<bb<<std::endl;

    torch::Tensor tmp = (aa == 3);
    torch::Tensor tmp_2 = (bb >= 0.9);
    std::cout<<tmp<<std::endl;
    std::cout<<tmp_2<<std::endl;
    torch::Tensor condition_111 = tmp * tmp_2;

    std::cout<<condition_111<<std::endl;
    aa.masked_fill_(condition_111,-1);

     std::cout<<aa<<std::endl;

输出如下：
3 2 3 3
5 6 7 8
9 10 11 12
13 14 15 16
[ Variable[CPUFloatType]{4,4} ]
0.1000 0.1000 0.9000 0.9000
0.9000 0.1000 0.1000 0.1000
0.1000 0.1000 0.8000 0.8000
0.8000 0.8000 0.8000 0.8000
[ Variable[CPUFloatType]{4,4} ]
1 0 1 1
0 0 0 0
0 0 0 0
0 0 0 0
[ Variable[CPUByteType]{4,4} ]
0 0 1 1
1 0 0 0
0 0 0 0
0 0 0 0
[ Variable[CPUByteType]{4,4} ]
0 0 1 1
0 0 0 0
0 0 0 0
0 0 0 0
[ Variable[CPUByteType]{4,4} ]
3 2 -1 -1
5 6 7 8
9 10 11 12
13 14 15 16
[ Variable[CPUFloatType]{4,4} ]

29.libtorch综合操作1

   torch::jit::script::Module module = torch::jit::load(argv[1]);
    std::cout << "== Switch to GPU mode" << std::endl;
    // to GPU
    module.to(at::kCUDA);

    if (LoadImage(file_name, image)) {
            auto input_tensor = torch::from_blob(
                    image.data, {1, kIMAGE_SIZE, kIMAGE_SIZE, kCHANNELS});
            input_tensor = input_tensor.permute({0, 3, 1, 2});
            input_tensor[0][0] = input_tensor[0][0].sub_(0.485).div_(0.229);
            input_tensor[0][1] = input_tensor[0][1].sub_(0.456).div_(0.224);
            input_tensor[0][2] = input_tensor[0][2].sub_(0.406).div_(0.225);

            // to GPU
            input_tensor = input_tensor.to(at::kCUDA);

            torch::Tensor out_tensor = module.forward({input_tensor}).toTensor();

            auto results = out_tensor.sort(-1, true);
            auto softmaxs = std::get<0>(results)[0].softmax(0);
            auto indexs = std::get<1>(results)[0];

            for (int i = 0; i < kTOP_K; ++i) {
                auto idx = indexs[i].item<int>();
                std::cout << "    ============= Top-" << i + 1
                          << " =============" << std::endl;
                std::cout << "    Label:  " << labels[idx] << std::endl;
                std::cout << "    With Probability:  "
                          << softmaxs[i].item<float>() * 100.0f << "%" << std::endl;
            }

        }

30.pytorch nms <---------> libtorch nms

pytorch nms
比如：
boxes [1742,4]
scores [1742]

def nms(boxes, scores, overlap=0.5, top_k=200):
    """Apply non-maximum suppression at test time to avoid detecting too many
    overlapping bounding boxes for a given object.
    Args:
        boxes: (tensor) The location preds for the img, Shape: [num_priors,4].
        scores: (tensor) The class predscores for the img, Shape:[num_priors].
        overlap: (float) The overlap thresh for suppressing unnecessary boxes.
        top_k: (int) The Maximum number of box preds to consider.
    Return:
        The indices of the kept boxes with respect to num_priors.
    """
    keep = scores.new(scores.size(0)).zero_().long()
    if boxes.numel() == 0:
        return keep
    x1 = boxes[:, 0]
    y1 = boxes[:, 1]
    x2 = boxes[:, 2]
    y2 = boxes[:, 3]
    area = torch.mul(x2 - x1, y2 - y1)
    v, idx = scores.sort(0)  # sort in ascending order
    # I = I[v >= 0.01]
    idx = idx[-top_k:]  # indices of the top-k largest vals
    xx1 = boxes.new()
    yy1 = boxes.new()
    xx2 = boxes.new()
    yy2 = boxes.new()
    w = boxes.new()
    h = boxes.new()

    # keep = torch.Tensor()
    count = 0
    while idx.numel() > 0:
        i = idx[-1]  # index of current largest val
        # keep.append(i)
        keep[count] = i
        count += 1
        if idx.size(0) == 1:
            break
        idx = idx[:-1]  # remove kept element from view
        # load bboxes of next highest vals
        torch.index_select(x1, 0, idx, out=xx1)
        torch.index_select(y1, 0, idx, out=yy1)
        torch.index_select(x2, 0, idx, out=xx2)
        torch.index_select(y2, 0, idx, out=yy2)
        # store element-wise max with next highest score
        xx1 = torch.clamp(xx1, min=x1[i])
        yy1 = torch.clamp(yy1, min=y1[i])
        xx2 = torch.clamp(xx2, max=x2[i])
        yy2 = torch.clamp(yy2, max=y2[i])
        w.resize_as_(xx2)
        h.resize_as_(yy2)
        w = xx2 - xx1
        h = yy2 - yy1
        # check sizes of xx1 and xx2.. after each iteration
        w = torch.clamp(w, min=0.0)
        h = torch.clamp(h, min=0.0)
        inter = w*h
        # IoU = i / (area(a) + area(b) - i)
        rem_areas = torch.index_select(area, 0, idx)  # load remaining areas)
        union = (rem_areas - inter) + area[i]
        IoU = inter/union  # store result in iou
        # keep only elements with an IoU <= overlap
        idx = idx[IoU.le(overlap)]
    return keep, count

libtorch nms

bool nms(const torch::Tensor& boxes, const torch::Tensor& scores, torch::Tensor &keep, int &count,float overlap, int top_k)
{
    count =0;
    keep = torch::zeros({scores.size(0)}).to(torch::kLong).to(scores.device());
    if(0 == boxes.numel())
    {
        return false;
    }

    torch::Tensor x1 = boxes.select(1,0).clone();
    torch::Tensor y1 = boxes.select(1,1).clone();
    torch::Tensor x2 = boxes.select(1,2).clone();
    torch::Tensor y2 = boxes.select(1,3).clone();
    torch::Tensor area = (x2-x1)*(y2-y1);
    //    std::cout<<area<<std::endl;

    std::tuple<torch::Tensor,torch::Tensor> sort_ret = torch::sort(scores.unsqueeze(1), 0, 0);
    torch::Tensor v = std::get<0>(sort_ret).squeeze(1).to(scores.device());
    torch::Tensor idx = std::get<1>(sort_ret).squeeze(1).to(scores.device());

    int num_ = idx.size(0);
    if(num_ > top_k) //python:idx = idx[-top_k:]
    {
        idx = idx.slice(0,num_-top_k,num_).clone();
    }
    torch::Tensor xx1,yy1,xx2,yy2,w,h;
    while(idx.numel() > 0)
    {
        auto i = idx[-1];
        keep[count] = i;
        count += 1;
        if(1 == idx.size(0))
        {
            break;
        }
        idx = idx.slice(0,0,idx.size(0)-1).clone();

        xx1 = x1.index_select(0,idx);
        yy1 = y1.index_select(0,idx);
        xx2 = x2.index_select(0,idx);
        yy2 = y2.index_select(0,idx);

        xx1 = xx1.clamp(x1[i].item().toFloat(),INT_MAX*1.0);
        yy1 = yy1.clamp(y1[i].item().toFloat(),INT_MAX*1.0);
        xx2 = xx2.clamp(INT_MIN*1.0,x2[i].item().toFloat());
        yy2 = yy2.clamp(INT_MIN*1.0,y2[i].item().toFloat());

        w = xx2 - xx1;
        h = yy2 - yy1;

        w = w.clamp(0,INT_MAX);
        h = h.clamp(0,INT_MAX);

        torch::Tensor inter = w * h;
        torch::Tensor rem_areas = area.index_select(0,idx);

        torch::Tensor union_ = (rem_areas - inter) + area[i];
        torch::Tensor Iou = inter * 1.0 / union_;
        torch::Tensor index_small = Iou < overlap;
        auto mask_idx = torch::nonzero(index_small).squeeze();
        idx = idx.index_select(0,mask_idx);//pthon: idx = idx[IoU.le(overlap)]
    }
    return true;
}

31.数据类型很重要！ .to(torch::kByte);

31.1

    //[128,512]
    torch::Tensor b = torch::argmax(output_1, 2).cpu();
    //    std::cout<<b<<std::endl;
    b.print();

    cv::Mat mask(T_height, T_width, CV_8UC1, (uchar*)b.data_ptr());
    imshow("mask",mask*255);
    waitKey(0);

[Variable[CPULongType] [128, 512]]

如上！得到的b是分割图[128, 512]。可是死活不能显示！！然后我检测b的值和pytorch的对比，发现是一致的。可是上面的就是死活得不到想要的分割图，全是黑的，为0.可是我把值打出来有不为0的啊！
之前工程也是这么写的啊，哎。。。然后我就github上面找psenet libtorch的实现，发现人家也是类似的写法

 cv::Mat tempImg = Mat::zeros(T_height, T_width, CV_8UC1);
 memcpy((void *) tempImg.data, b.data_ptr(), sizeof(torch::kU8) * b.numel());

我也这么写，发现还是不行！！！2个小时过去了，没有办法，我准备把128*512的数据保存在els里面查看。漫无目的的实验了一下
cout<<b[0][0].item().toFloat()<<endl;
这样可以打印出值，一定要加.toFloat()才行。漫无目的的编写循环
for(int i=0;i<128;i++)
for(int j=0;j<512;j++)
{
}
可是不服啊！哪里有问题呢，值都是对的就是显示不出来？
发现刚刚上面b[0][0].item().toFloat()必须加.toFloat()，那么我的b是什么类型的呢，是tensor类型的，具体什么类型呢，看到打印的[Variable[CPULongType] [128, 512]]，long类型的。
哦，那我转一下类型看看。翻看之前的转类型的，发现只需要在tensor后面加.to(torch::kFloat32);类似的
因为我需要int的，我就先int一下，
torch::Tensor b = torch::argmax(output_1, 2).cpu().to(torch::kInt);
试了一下还是不行，
.to(torch::kFloat32); 试了一下还是不行，
我在敲torch::k的时候编译器会自动弹出k开头的东西。其中第一个就是kByte.然后试了下：
torch::Tensor b = torch::argmax(output_1, 2).cpu().to(torch::kByte);
！！！！
可以了！出来了我想要的分割图。
搞死我了，数据类型的问题。至少整了2个小时！

31.2
要把中间处理的图片转为tensor

 Mat m_tmp = grayMat.clone();
    torch::Tensor label_deal = torch::from_blob(
                m_tmp.data, {grayMat.rows, grayMat.cols}).toType(torch::kByte).to(m_device);
//    label_deal = label_deal.to(m_device);
    auto aaa = torch::max(label_deal);
    std::cout<<label_deal<<std::endl;
    std::cout<<aaa<<std::endl;
    while(1);

又是一个大坑啊！！！一开始认为就这么就ok了，然后后面的处理结果不对，就一步步排查哪里出问题，然后定位到这里，m_tmp的像素值在tensor里面压根就对不上啊！！！我知道m_tmp最大像素值34，可是打出来的tensor最大255！！！哎，是torch::kByte类型啊！没办法，再换成kFloat32还是不行，值更离谱还有nan的。。呃呃呃。然后发现.toType(torch::kByte)还有.to(torch::kByte)这个写法的，到底用哪个还是一样？然后继续实验还是一样有问题，然后把.to(m_device);单独拎出来还是不行，因为根据之前的经验，torch::Tensor tmp = tmp.cpu();好像是需要单独写，要不然会有问题。那这边啥问题呢？像素值就是不能正确放到tensor！！！咋回事呢？？？
然后郁闷良久，那么Mat的类型是不是也要转。

 Mat m_tmp = grayMat.clone();
    m_tmp.convertTo(m_tmp,CV_32FC1);/又是个大坑 图片要先转float32啊
    torch::Tensor label_deal = torch::from_blob(
                m_tmp.data, {grayMat.rows, grayMat.cols}).toType(torch::kByte).to(m_device);

这样就可以了！！！呃呃呃，一定要转CV_32FC1吗？可能是吧！

32.指针访问Tensor数据

        torch::Tensor output = m_model->forward({input_tensor}).toTensor()[0];
        torch::Tensor output_cpu = output.cpu();
        //output_cpu     Variable[CPUFloatType] [26, 480, 480]]
        output_cpu.print();

        void *ptr = output_cpu.data_ptr();
        //std::cout<<(float*)ptr[0]<<std::endl;

只能用void 或者auto来定义，否则会报错。比如我用float ptr = output_cpu.data_ptr();会报错：
error: invalid conversion from ‘void’ to ‘float’ [-fpermissive]
float *ptr = output_cpu.data_ptr();
那么void *编译通过了，我需要用指针访问tensor里面的数据啊！

torch::Tensor output = m_model->forward({input_tensor}).toTensor()[0];
        torch::Tensor output_cpu = output.cpu();
        //output_cpu     Variable[CPUFloatType] [26, 480, 480]]
        output_cpu.print();
        void *ptr = output_cpu.data_ptr();
        std::cout<<(float*)ptr<<std::endl;

如上这么写，输出：

[Variable[CPUFloatType] [26, 480, 480]]
0x7fab195ee040

输出来的是个地址，那怎么访问数据呢，自然而然的就这么写：
std::cout<<(float)ptr[0]<<std::endl;
这么写又报错！！！！
: error: ‘void’ is not a pointer-to-object type，然后又这么写：
std::cout<<(float*)ptr[0][0][0]<<std::endl;还是报一样的错误！。没有办法，然后Google了一下，发现有报错和我一样的，以及解决方案：

果真！解决了！

        void *ptr = output_cpu.data_ptr();
//        std::cout<<*((float*)ptr[0][0][0])<<std::endl;
//        std::cout<<(float*)ptr[0][0][0]<<std::endl;

         std::cout<<*((float*)(ptr+2))<<std::endl;

还有一种写法：

const float* result = reinterpret_cast<const float *>(output_cpu.data_ptr());

还有刚刚的那种写法：

 void *ptr = output_cpu.data_ptr();
 const float* result = (float*)ptr;

33 PyTorch内Tensor按索引赋值的方法比较

PyTorch内Tensor按索引赋值的方法比较[https://www.jianshu.com/p/e568213c8501]

44 输出多个tensor（pytorch端）以及取出多个tensor（libtorch端）

pytorch端的输出：

    def forward(self, x, batch=None):
        output, cnn_feature = self.dla(x)
        return (output['ct_hm'],output['wh'],cnn_feature)

对应的libtorch端

    auto out = m_model->forward({input_tensor});
    auto tpl = out.toTuple();
    auto out_ct_hm = tpl->elements()[0].toTensor();
    out_ct_hm.print();
    auto out_wh = tpl->elements()[1].toTensor();
    out_wh.print();
    auto out_cnn_feature = tpl->elements()[2].toTensor();
    out_cnn_feature.print();

如果输出单个tensor，就是

at::Tensor output = module->forward(inputs).toTensor();

45. torch::Tensor作为函数参数，不管是引用还是不引用，函数内部对形参操作都会影响本来的tensor，即都是引用

void test_tensor(torch::Tensor a)
{
    a[0][0] = -100;

}

int main(int argc, const char* argv[])
{

    torch::Tensor p = torch::rand({2,2});
    std::cout<<p<<std::endl;
    std::cout<<"~~~~#########~~~~~~~~~~~~~~~~~~~~~~~~~~"<<std::endl;
    test_tensor(p);
    std::cout<<p<<std::endl;
    while (1);
}

输出如下：

 0.0509  0.3509
 0.8019  0.1350
[ Variable[CPUType]{2,2} ]
~~~~#########~~~~~~~~~~~~~~~~~~~~~~~~~~
-100.0000    0.3509
   0.8019    0.1350
[ Variable[CPUType]{2,2} ]

可以看出，函数void test_tensor(torch::Tensor a)，虽然不是引用，但是经过了这个函数之后值改变了！

46. 实现pytorch下标神操作

比如在pytorch端，写法如下：

c=b[a]

其中，a的形状是[1,100], b的形状是[1,100,40,2]，所以，大家猜c的形状是什么。。哦，还有一个已知条件是a相当于一个掩模，a里面的值只有0或者1，假设a的前5个值是1，其余为0
得到的c的形状是[5,40,2],大概也能猜到就是把为1的那些行取出，其余的不要！那么，libtorch端如何优雅的实现呢？
呃呃呃，暂时没有想到什么好法子，因为libtorch端不支持下标操作。。很麻烦。。。然后自己写的循环实现的：
为了方便看数值，只假设10个。

// aim [1,10,2,2]   ind_mask_ [1,10] 比如前5个是1余都是0  得到的结果形状是[5,40,2]  即pytorch里面的操作 aim = aim[ind_mask]
torch::Tensor deal_mask_index22(torch::Tensor aim_,torch::Tensor ind_mask_)
{
    torch::Tensor aim = aim_.clone().squeeze(0);//[1,100,40,2]  -->> [100,40,2]
    torch::Tensor ind_mask = ind_mask_.clone().squeeze(0);[1,100]  -->> [100]
    int row = ind_mask.size(0);
    int cnt = 0;
    for(int i=0;i<row;i++)
    {
        if(ind_mask[i].item().toInt())
        {
            cnt += 1;
        }
    }
    torch::Tensor out = torch::zeros({cnt,aim.size(1),aim.size(2)});
    int index_ = 0;
    for(int i=0;i<row;i++)
    {
        if(ind_mask[i].item().toInt())
        {
            out[index_++] = aim[i];
//            std::cout<<i<<std::endl;
        }
    }

    std::cout<<"##############################################"<<std::endl;
    std::cout<<out<<std::endl;
    
    return out;
}

int main(int argc, const char* argv[])
{
    torch::Tensor ind_mask = torch::ones({1,10});
    ind_mask[0][0] = 0;
    ind_mask[0][1] = 0;
    ind_mask[0][2] = 0;
    ind_mask[0][4] = 0;

    torch::Tensor aim = torch::rand({1,10,2,2});
    std::cout<<aim<<std::endl;

    deal_mask_index22(aim,ind_mask);


    while (1);
}

47.pytorch libtorch的tensor验证精度

[pytorch libtorch的tensor验证精度](pytorch libtorch的tensor验证精度)
https://www.cnblogs.com/yanghailin/p/13669046.html

48. 其他--颜色映射

 /
    auto t1 = std::chrono::steady_clock::now();
//    static torch::Tensor tensor_m0 = torch::zeros({m_height,m_width}).to(torch::kByte).to(torch::kCPU);
//    static torch::Tensor tensor_m1 = torch::zeros({m_height,m_width}).to(torch::kByte).to(torch::kCPU);
//    static torch::Tensor tensor_m2 = torch::zeros({m_height,m_width}).to(torch::kByte).to(torch::kCPU);

    static torch::Tensor tensor_m0 = torch::zeros({m_height,m_width}).to(torch::kByte);
    static torch::Tensor tensor_m1 = torch::zeros({m_height,m_width}).to(torch::kByte);
    static torch::Tensor tensor_m2 = torch::zeros({m_height,m_width}).to(torch::kByte);
    tensor_m0 = tensor_m0.to(torch::kCUDA);
    tensor_m1 = tensor_m1.to(torch::kCUDA);
    tensor_m2 = tensor_m2.to(torch::kCUDA);
    for(int i=1;i<m_color_cnt;i++)
    {
        tensor_m0.masked_fill_(index==i,colormap[i * 3]);
        tensor_m1.masked_fill_(index==i,colormap[i * 3 + 1]);
        tensor_m2.masked_fill_(index==i,colormap[i * 3 + 2]);
    }
    torch::Tensor tensor_m00 = tensor_m0.cpu();
    torch::Tensor tensor_m11 = tensor_m1.cpu();
    torch::Tensor tensor_m22 = tensor_m2.cpu();
    cv::Mat m0 = cv::Mat(m_height, m_width, CV_8UC1, (uchar*)tensor_m00.data_ptr());
    cv::Mat m1 = cv::Mat(m_height, m_width, CV_8UC1, (uchar*)tensor_m11.data_ptr());
    cv::Mat m2 = cv::Mat(m_height, m_width, CV_8UC1, (uchar*)tensor_m22.data_ptr());
    std::vector<cv::Mat> channels = {m0,m1,m2};
    cv::Mat mergeImg;
    cv::merge(channels, mergeImg);
    mergeImg = mergeImg.clone();
    auto ttt1 = std::chrono::duration_cast<std::chrono::milliseconds>
            (std::chrono::steady_clock::now() - t1).count();
    std::cout << "merge time="<<ttt1<<"ms"<<std::endl;
    /

用cpu需要３５ｍｓ左右,ｇｐｕ２－３ｍｓ,下面的代码实现功能一样也是２－３ｍｓ

 auto t0 = std::chrono::steady_clock::now();
    for (int i = 0; i<labelMat.rows; i++)
    {
        for (int j = 0; j<labelMat.cols; j++)
        {
            int id = labelMat.at<uchar>(i,j);
            if(0 == id)
            {
                continue;
            }
            colorMat.at<cv::Vec3b>(i, j)[0] = colormap[id * 3];
            colorMat.at<cv::Vec3b>(i, j)[1] = colormap[id * 3 + 1];
            colorMat.at<cv::Vec3b>(i, j)[2] = colormap[id * 3 + 2];
        }
    }
    auto ttt = std::chrono::duration_cast<std::chrono::milliseconds>
            (std::chrono::steady_clock::now() - t0).count();
    std::cout << "consume time="<<ttt<<"ms"<<std::endl;

49.torch.gather

纯pytorch端的：（转载于https://www.jianshu.com/p/5d1f8cd5fe31）
torch.gather(input, dim, index, out=None) → Tensor
沿给定轴 dim ,将输入索引张量 index 指定位置的值进行聚合.
对一个 3 维张量,输出可以定义为:

out[i][j][k] = input[index[i][j][k]][j][k]  # if dim == 0
out[i][j][k] = input[i][index[i][j][k]][k]  # if dim == 1
out[i][j][k] = input[i][j][index[i][j][k]]  # if dim == 2

Parameters:
input (Tensor) – 源张量
dim (int) – 索引的轴
index (LongTensor) – 聚合元素的下标(index需要是torch.longTensor类型)
out (Tensor, optional) – 目标张量

例子：
dim = 1

import torch
a = torch.randint(0, 30, (2, 3, 5))
print(a)
#tensor([[[ 18.,   5.,   7.,   1.,   1.],
#         [  3.,  26.,   9.,   7.,   9.],
#         [ 10.,  28.,  22.,  27.,   0.]],

#        [[ 26.,  10.,  20.,  29.,  18.],
#         [  5.,  24.,  26.,  21.,   3.],
#         [ 10.,  29.,  10.,   0.,  22.]]])
index = torch.LongTensor([[[0,1,2,0,2],
                          [0,0,0,0,0],
                          [1,1,1,1,1]],
                        [[1,2,2,2,2],
                         [0,0,0,0,0],
                         [2,2,2,2,2]]])
print(a.size()==index.size())
b = torch.gather(a, 1,index)
print(b)
#True
#tensor([[[ 18.,  26.,  22.,   1.,   0.],
#         [ 18.,   5.,   7.,   1.,   1.],
#         [  3.,  26.,   9.,   7.,   9.]],

#        [[  5.,  29.,  10.,   0.,  22.],
#         [ 26.,  10.,  20.,  29.,  18.],
#         [ 10.,  29.,  10.,   0.,  22.]]])

dim =2

c = torch.gather(a, 2,index)
print(c)
#tensor([[[ 18.,   5.,   7.,  18.,   7.],
#         [  3.,   3.,   3.,   3.,   3.],
#         [ 28.,  28.,  28.,  28.,  28.]],

#       [[ 10.,  20.,  20.,  20.,  20.],
#        [  5.,   5.,   5.,   5.,   5.],
#        [ 10.,  10.,  10.,  10.,  10.]]])

dim = 0

index2 = torch.LongTensor([[[0,1,1,0,1],
                          [0,1,1,1,1],
                          [1,1,1,1,1]],
                        [[1,0,0,0,0],
                         [0,0,0,0,0],
                         [1,1,0,0,0]]])
d = torch.gather(a, 0,index2)
print(d)
#tensor([[[ 18.,  10.,  20.,   1.,  18.],
#         [  3.,  24.,  26.,  21.,   3.],
#         [ 10.,  29.,  10.,   0.,  22.]],

#       [[ 26.,   5.,   7.,   1.,   1.],
#         [  3.,  26.,   9.,   7.,   9.],
#         [ 10.,  29.,  22.,  27.,   0.]]])

这里我之前看过然后再看到的时候又是一头雾水，然后记录在此！主要是这个
out[i][j][k] = input[i][index[i][j][k]][k] # if dim == 1
可是这个gather函数可以干什么呢？直观上就是output和input的形状是一样的，自己推导一两个看看，比如dim=1
output[0][0][0] = input[0] [index[0][0][0]] [0],然后先查找index找到index[0][0][0]=0，然后再查找input[0][0][0]
流程就是这样，所以，index是下标索引，其值不能超过dim的维度！
直观上就是在某个维度整了个新的映射规则得到output，关键还在于index！这个就是规则。

50. torch::argsort(libtorch1.0没有这个函数) torch::sort

用1.1写好的一个libtorch工程，由于项目是用1.0的，然后把写好的1.1转1.0.然后提示说：
error: ‘argsort’ is not a member of ‘torch’
恩，我知道了，就是由于版本问题导致函数名对不上，可是我去哪里找argsort啊，然后，看到之前的max好像有记录索引的，然后又看到sort，然后实验了一下，和argsort结果一样！
//pytorch1.1
torch::Tensor edge_idx_sort2 = torch::argsort(edge_num, 2, true);
//pytorch1.0
std::tupletorch::Tensor,torch::Tensor sort_ret = torch::sort(edge_num, 2, true);
// torch::Tensor v = std::get<0>(sort_ret);
torch::Tensor edge_idx_sort = std::get<1>(sort_ret);

51. 判断tensor是否为空 ind_mask.sizes().empty()

int row = ind_mask.size(0);
如果ind_mask是空代码就会奔溃报错，

terminate called after throwing an instance of 'c10::Error'
  what():  dimension specified as 0 but tensor has no dimensions (maybe_wrap_dim at /data_1/leon_develop/pytorch/aten/src/ATen/core/WrapDimMinimal.h:9)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x6a (0x7f4cf0a4af5a in /data_2/project_202009/chejian/3rdparty/libtorch/lib/libc10.so)
frame #1: <unknown function> + 0x48a74f (0x7f4d010af74f in /data_2/project_202009/chejian/3rdparty/libtorch/lib/libcaffe2.so)
frame #2: at::native::size(at::Tensor const&, long) + 0x20 (0x7f4d010afac0 in /data_2/project_202009/chejian/3rdparty/libtorch/lib/libcaffe2.so)
frame #3: at::Tensor::size(long) const + 0x36 (0x467fba in /data_2/project_202009/libtorch/snake_libtorch_cuda8/cmake-build-debug/example-app)
frame #4: deal_mask_index(at::Tensor, at::Tensor) + 0x1a7 (0x45a83e in /data_2/project_202009/libtorch/snake_libtorch_cuda8/cmake-build-debug/example-app)
frame #5: get_gcn_feature(at::Tensor, at::Tensor, at::Tensor, int, int) + 0x4f3 (0x45e092 in /data_2/project_202009/libtorch/snake_libtorch_cuda8/cmake-build-debug/example-app)
frame #6: init_poly(std::shared_ptr<torch::jit::script::Module> const&, std::shared_ptr<torch::jit::script::Module> const&, at::Tensor const&, std::tuple<at::Tensor, at::Tensor, at::Tensor> const&) + 0x168 (0x45e777 in /data_2/project_202009/libtorch/snake_libtorch_cuda8/cmake-build-debug/example-app)
frame #7: main + 0xaee (0x463ab5 in /data_2/project_202009/libtorch/snake_libtorch_cuda8/cmake-build-debug/example-app)
frame #8: __libc_start_main + 0xf0 (0x7f4ced29c840 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: _start + 0x29 (0x456b89 in /data_2/project_202009/libtorch/snake_libtorch_cuda8/cmake-build-debug/example-app)

所以，有必要判断tensor是否为空，可是：

ind_mask.numel() //返回总个数，但是为空的时候返回1
ind_mask.sizes()// 返回类似python list的东东，[1, 100, 40, 2]  [1, 40, 2]

ind_mask.sizes()然后我跟到sizes()libtorch函数定义里面是IntList类型的，然后再跟踪，using IntList = ArrayRef<int64_t>;然后再跟踪，ArrayRef，然后看这个类，找到

  /// empty - Check if the array is empty.
  constexpr bool empty() const {
    return Length == 0;
  }

所以，说明有判断为空的成员函数可以调用！
if(ind_mask.sizes().empty())
{
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
我也太难了吧！
本来以为搞定了的！

if(ind_mask.sizes().empty())
    {
        torch::Tensor tmp;
        return tmp;
    }

当判断一个tensor为空，我就创建一个tensor退出，因为函数返回是torch::Tensor类型的。
但是直接创建的这个tensor访问sizes也会报错！！！
如下：

 torch::Tensor tmp;
 tmp.print(); //打印[UndefinedTensor]

if(tmp.sizes().empty())
{
}

[UndefinedTensor]
terminate called after throwing an instance of 'c10::Error'
  what():  sizes() called on undefined Tensor (sizes at /data_1/leon_develop/pytorch/aten/src/ATen/core/UndefinedTensorImpl.cpp:12)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x6a (0x7f35f1b21f5a in /data_2/project_202009/chejian/3rdparty/libtorch/lib/libc10.so)
frame #1: at::UndefinedTensorImpl::sizes() const + 0x77 (0x7f360217d6b7 in /data_2/project_202009/chejian/3rdparty/libtorch/lib/libcaffe2.so)
frame #2: at::Tensor::sizes() const + 0x27 (0x45e921 in /data_2/project_202009/libtorch/snake_libtorch_cuda8/cmake-build-debug/example-app)
frame #3: main + 0x55 (0x45bcaa in /data_2/project_202009/libtorch/snake_libtorch_cuda8/cmake-build-debug/example-app)
frame #4: __libc_start_main + 0xf0 (0x7f35ee373840 in /lib/x86_64-linux-gnu/libc.so.6)
frame #5: _start + 0x29 (0x44f889 in /data_2/project_202009/libtorch/snake_libtorch_cuda8/cmake-build-debug/example-app)

但是这个时候：

 torch::Tensor tmp;
tmp.print();
std::cout<<tmp.numel()<<std::endl; // 输出为0

！！！！
所以，直接定义tensor，这个时候的.numel()为0.

52.pytorch代码 out = aim[ind_mask]，用libtorch写出来。

pytorch代码
out = aim[ind_mask]
其中，形状如下：
aim [21, 40, 2]
ind_mask [21] #元素非0即1,比如有12个1
out输出形状是[12,40,2]
#####################################
以上pytorch代码out = aim[ind_mask]
如何用libtorch代码表示出来

    torch::Tensor a = torch::rand({5,3,2});
    torch::Tensor idx = torch::zeros({5}).toType(torch::kLong);
    idx[3] = 1;
    idx[1] = 1;

    torch::Tensor abc = torch::nonzero(idx);
    torch::Tensor b = a.index_select(0,abc.squeeze());

    std::cout<<a<<std::endl;
    std::cout<<abc<<std::endl;
    std::cout<<b<<std::endl;

输出如下：

(1,.,.) = 
  0.1767  0.8695
  0.3779  0.3531
  0.3413  0.3734

(2,.,.) = 
  0.9664  0.7723
  0.8640  0.7289
  0.8395  0.6344

(3,.,.) = 
  0.9043  0.2671
  0.9901  0.2966
  0.0347  0.1650

(4,.,.) = 
  0.1457  0.1169
  0.7983  0.5157
  0.6405  0.2213

(5,.,.) = 
  0.7977  0.4066
  0.6691  0.7191
  0.5897  0.7400
[ Variable[CPUFloatType]{5,3,2} ]
 1
 3
[ Variable[CPULongType]{2,1} ]
(1,.,.) = 
  0.9664  0.7723
  0.8640  0.7289
  0.8395  0.6344

(2,.,.) = 
  0.1457  0.1169
  0.7983  0.5157
  0.6405  0.2213
[ Variable[CPUFloatType]{2,3,2} ]

53. pytorch代码a4 = arr[...,3,0] 用libtorch如何表达出来 masked_select运用！

>>> import numpy as np
>>> arr = np.arange(40).reshape(1,5,4,2)
>>> arr
array([[[[ 0,  1],
         [ 2,  3],
         [ 4,  5],
         [ 6,  7]],

        [[ 8,  9],
         [10, 11],
         [12, 13],
         [14, 15]],

        [[16, 17],
         [18, 19],
         [20, 21],
         [22, 23]],

        [[24, 25],
         [26, 27],
         [28, 29],
         [30, 31]],

        [[32, 33],
         [34, 35],
         [36, 37],
         [38, 39]]]])
>>> a1 = arr[...,0,1]
>>> a2 = arr[...,1,0]
>>> a3 = arr[...,2,1]
>>> a4 = arr[...,3,0]
>>> print(a1)
[[ 1  9 17 25 33]]
>>> print(a2)
[[ 2 10 18 26 34]]
>>> print(a3)
[[ 5 13 21 29 37]]
>>> print(a4)
[[ 6 14 22 30 38]]
>>>

一开始折腾好久，好像没有什么好办法，然后用for循环完成的，

//ex shape[1,5,4,2]      ex[..., 0, 1]  -->>[1,5]
torch::Tensor index_tensor_3(const torch::Tensor &ex,const int &idx1,const int &idx2)
{
//    ex.print();
    int dim_ = ex.size(1);
    torch::Tensor out = torch::empty({1,dim_}).to(ex.device());
    int size_ = ex.size(1);
    for(int i=0;i<size_;i++)
    {
        auto a = ex[0][i][idx1][idx2];
        out[0][i] = a;
        //        std::cout<<a<<std::endl;
    }
    
    return out;
}

然后优化，用纯libtorch函数完成：

//ex shape[1,5,4,2]      ex[..., 0, 1] -->>[1,5]
torch::Tensor index_tensor_3(const torch::Tensor &ex,const int &idx1,const int &idx2)
{
    const int dim0 = ex.size(0);
    const int dim1 = ex.size(1);
    const int dim2 = ex.size(2);
    const int dim3 = ex.size(3);

    std::vector<int> v_index(ex.numel());//初始化：ex.numel() 个0
    int offset = dim2 * dim3;
    for(int i=0;i<dim1;i++)
    {
        int index_ = idx1 * dim3 + idx2;
        v_index[i * offset + index_] = 1;
    }

    torch::Tensor index = torch::tensor(v_index).to(ex.device());
    index = index.reshape(ex.sizes()).toType(torch::kByte);//这里需要kByte类型
//    std::cout<<index<<std::endl;

    torch::Tensor selete = ex.masked_select(index).unsqueeze(0);
    return selete;
}

接上函数，大概累计调用这个函数10次，第一种需要耗时15ms，而下面的耗时5ms

54.再次强调一下类型很重要！！有时候需要强制写下　　　　　kernel = kernel.toType(torch::kByte);

今天一个需求是用libtorch1.8的跑libtorch1.0的ｐｔ模型，稍微改改语法，旧版本的就可以在高版本编译通过，并且可以运行，但是运行的结果不对。这个挺麻烦的。
因为不知道问题出在哪里。首先值得怀疑的是不支持。为了验证这个问题，就是首先是用高版本的和旧版本输入都一样跑推理，看看模型出来的结果是否一致。当然这个也挺费事的，因为pytorch高版本的
需要跑低版本的,需要改挺多东西的。没办法，我改了，各种报错啊，我是psenet，这东东是运行在cuda8,python2.7上面的，不单单是print,还有其他各种各样的问题，原因在于各种数据处理需要用到各种库，后来我不管三七二十一全删了，
因为我发现跑推理就是
out = model(img)
这句话，我只要准备同样的img就可以了。很长很长的test.py文件就被我浓缩为如下：

#encoding=utf-8
import os
import cv2
import sys
import time
import collections
import torch
import argparse
import numpy as np


import models
#import util


def test(args):


    # Setup Model
    if args.arch == "resnet50":
        model = models.resnet50(pretrained=True, num_classes=7, scale=args.scale)
    elif args.arch == "resnet101":
        model = models.resnet101(pretrained=True, num_classes=7, scale=args.scale)
    elif args.arch == "resnet152":
        model = models.resnet152(pretrained=True, num_classes=7, scale=args.scale)
    
    for param in model.parameters():
        param.requires_grad = False

    model = model.cuda()
    
    if args.resume is not None:                                         
        if os.path.isfile(args.resume):
            print("Loading model and optimizer from checkpoint '{}'".format(args.resume))
            checkpoint = torch.load(args.resume)
            
            # model.load_state_dict(checkpoint['state_dict'])
            d = collections.OrderedDict()
            for key, value in checkpoint['state_dict'].items():
                tmp = key[7:]
                d[tmp] = value
            model.load_state_dict(d)

            print("Loaded checkpoint '{}' (epoch {})"
                  .format(args.resume, checkpoint['epoch']))
            sys.stdout.flush()
        else:
            print("No checkpoint found at '{}'".format(args.resume))
            sys.stdout.flush()

    model.eval()

    img_tmp = torch.rand(1, 3, 963, 1280).cuda()
    traced_script_module = torch.jit.trace(model, img_tmp)
    traced_script_module.save("./myfile/22.pt")

    init_seed = 1 #设置同样的种子确保产生一样的随机数
    torch.manual_seed(init_seed)
    torch.cuda.manual_seed(init_seed)

    img_tmp = torch.rand(1, 3, 64, 64).cuda()
    out = model(img_tmp)
    print(img_tmp)
    print(out)


    print("save pt ok!")
    return 1


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Hyperparams')
    parser.add_argument('--arch', nargs='?', type=str, default='resnet50')
    parser.add_argument('--resume', nargs='?', type=str, default="./myfile/checkpoint.pth.tar",    
                        help='Path to previous saved model to restart from')
    parser.add_argument('--binary_th', nargs='?', type=float, default=1.0,
                        help='Path to previous saved model to restart from')
    parser.add_argument('--kernel_num', nargs='?', type=int, default=3,
                        help='Path to previous saved model to restart from')
    parser.add_argument('--scale', nargs='?', type=int, default=1,
                        help='Path to previous saved model to restart from')
    parser.add_argument('--long_size', nargs='?', type=int, default=1280,
                        help='Path to previous saved model to restart from')
    parser.add_argument('--min_kernel_area', nargs='?', type=float, default=10.0,
                        help='min kernel area')
    parser.add_argument('--min_area', nargs='?', type=float, default=300.0,
                        help='min area')
    parser.add_argument('--min_score', nargs='?', type=float, default=0.93,
                        help='min score')
    
    args = parser.parse_args()
    test(args)

这里很重要：
init_seed = 1 #设置同样的种子确保产生一样的随机数
torch.manual_seed(init_seed)
torch.cuda.manual_seed(init_seed)
因为我同时需要在torch1.0和torch1.8上面验证模型精度，需要控制输入一样的，所以设置同样的种子确保产生一样的随机数。print打印出来验证是一致的。
然后我发现out是有差别的，但是只是小数点后面３位不同，前面几位一样，所以我感觉高版本加载低版本权重运行是ｏｋ的！但是libtorch里面结果相差很大，为啥呢?
这个就需要仔细看libtorch的代码了！！！
然后漫无目的的实验，打印。。这里说一下打印很重要！！！
我先在我的低版本的libtorch打印的部分内容如下：

[ Variable[CPUByteType]{7,703,1280} ]
[Variable[CPUByteType] [7, 703, 1280]]
[Variable[CPUByteType] [3, 703, 1280]]
kernel_size=3
[Variable[CPUByteType] [3, 703, 1280]]

然后高版本的打印的如下：

[CUDAFloatType [1, 7, 703, 1280]]
[CPUFloatType [7, 703, 1280]]
[CPUFloatType [3, 703, 1280]]
kernel_size=3
[CPUFloatType [3, 703, 1280]]

额，看到没有，数据类型不一样啊，为啥不一样啊，所以我就知道了又是哪里数据类型的问题。
然后加了这句话，

kernel = kernel.toType(torch::kByte);

完美解决！
就是一些操作低版本默认是CPUByteType类型，但是到了高版本就是CPUFloatType类型了。
看似简单的一句话，耗费我大半天！
所以总结起来，上面就是我查找问题的思路流程并且完美解决问题。总结起来就是需要不断查找定位问题并不断实验解决问题。

然后再发一个最近遇到的opencv的Mat的一个数据类型的问题。

Mat convertTo3Channels_2(const Mat& binImg)
{
    Mat three_channel = Mat::zeros(binImg.rows,binImg.cols,CV_8UC3);
    vector<Mat> channels;
    for (int i=0;i<3;i++)
    {
        channels.push_back(binImg);
    }
    merge(channels,three_channel);
    three_channel.convertTo(three_channel,CV_8UC3); //重要，还要再写一次！！
    return three_channel;
}

看代码，
我一开始声明的CV_8UC3这个类型的， Mat three_channel = Mat::zeros(binImg.rows,binImg.cols,CV_8UC3);。因为函数传出去我就是需要uint类型的。
three_channel.convertTo(three_channel,CV_8UC3); //重要，还要再写一次！！
这里，这里还需要再写一次，要不然传出去的不是这个类型的，Mat不知道如何查看或者打印出这个类型，但是我是通过我的调试器gdb imagewatch看这张图片，下面会显示类型。
我去复现了一下并截图了，我在
merge(channels,three_channel);这句话下面打断点我的gdb imagewatch显示如下类型

看到没有，我明明初始化的是Mat three_channel = Mat::zeros(binImg.rows,binImg.cols,CV_8UC3);
CV_8UC3类型，这个应该是uint类型的，可是merge之后就是float类型的了，可能就是merge这个函数给我改变类型了的吧。
导致函数传出去的后面的一些操作很奇怪，也不知道问题出在哪里。

然后再强制转一下就可以。
three_channel.convertTo(three_channel,CV_8UC3); //重要，还要再写一次！！

总结：
类型很重要
类型很重要
类型很重要
重要的事情说三遍。