UFLD & UFLDv2 paper study

UFLD & UFLDv2 Learning

FLD

The judgment of the lane position is performed in units of behavior, that is, the lane is detected based on the anchor.

A New Paradigm for Lane Marking Detection

The author believes that the problems of high speed and no visual information are extremely important in lane line detection, and the proposed method effectively solves these problems.

Paradigm Definition : To address the above issues, the authors propose to formulate lane detection as a line selection method based on global image features. In other words, use global features to select the correct lane location on each predefined row. In the formula, a lane is represented as a series of horizontal positions of a predefined row, known as row anchors. To represent locations, the first step is meshing. On each row anchor, the position is divided into a number of cells. In this way, channel detection can be described as selecting certain cells on predefined row anchors, as shown in Fig. 3(a).

insert image description here

Let the maximum number of lanes be C, the number of row anchor points be h, and the number of grid cells be w. Let X be the global image feature, fijf^{ij}fij is theiii lane line,jjthA classifier for selecting lane locations on j row anchors. Then lane prediction can be written as:
P i , j , : = fij ( X ) , s . t . i ∈ [ 1 , C ] , j ∈ [ 1 , h ] , P_{i,j,:}=f^{ ij}(X),\ st\ i\in[1,C],j\in [1,h],Pi,j,:=fij(X), s.t. i[1,C],j[1,h ] ,
of whichP i , j , : P_{i,j,:}Pi,j,:Because ( w + 1 ) (w+1)(w+1 ) dimensional vector representing the choice(w + 1) (w+1)(w+1 ) grid cell as theiithi lane, thejjthProbability of row j anchors. SupposeTI , J , : T_{I,J,:}TI,J,:is a one-hot coded label in the correct position. The loss function of this paradigm is:
L cls = ∑ i − 1 C ∑ j = 1 h LCE ( P i , j , : , T i , j , : ) , L_{cls}=\sum^C_{i-1 }\sum^h_{j=1}L_{CE}(P_{i,j,:},T_{i,j,:}),Lcls=i1Cj=1hLCE(Pi,j,:,Ti,j,:) ,
whereLCE L_{CE}LCEis the cross-entropy loss. Use an extra dimension to represent the absence of lanes, so (w + 1) is used (w+1)(w+1 ) dimension instead ofwww dimension for classification.

PS: The author believes that this method predicts the probability distribution of all positions on each row anchor based on the global features. Therefore, the correct location can be selected according to the probability distribution.

How to achieve faster running speed : The difference between the proposed paradigm and segmentation is shown in Fig. 3. Suppose the image size is H × WH × WH×w . In general, the number of predefined row anchors and the size of the grid are much smaller than the size of an image, that is to sayh ≪ H h\ll HhH w ≪ W w\ll W ww . In this way, the original division formula needs to be carried out( C + 1 (C + 1(C+1 ) DimensionH × WH × WH×W is classified, and our formula only needs to solve(W + 1) (W + 1)(W+1 ) DimensionC × HC × HC×H classification problem. This can greatly reduce the computational size, because the computational cost of our formula isC × h × ( w + 1 ) C ×h×(w+1)C×h×(w+1 ) , and the computational cost of the split isH × w × ( C + 1 ) H × w × (C +1)H×w×(C+1 ) . For example, using the general setting of the CULane dataset [22], the ideal computational cost of our method is1.7 × 10 4 1.7 × 10^41.7×104 calculations, the calculation cost of division is1.15 × 1 0 6 1.15 × 10^61.15×106 calculations. Computational costs are greatly reduced, and our formula can achieve extreme speed.

How to deal with the no-visual-cues problem : To deal with the no-visual-cues problem, it is important to utilize information from other locations, since no visual cues means no information at the target location. For example, a lane is blocked by a car, but we can still locate the lane with information about other lanes, road shape, and even the direction of the car. Therefore, exploiting information from other locations is the key to solving the problem of no visual cues, as shown in Figure 1.

From the perspective of the receptive field, our formulation has a receptive field of the whole image, which is much larger than segmentation methods. Contextual information and information from other locations in the image can be used to solve problems without visual cues. From a learning perspective, based on our formulation, structural loss can also be used to learn prior information such as the shape and orientation of lanes, as shown in Section 3.2. In this way, problems without visual cues can be handled in our paradigm.

Another important benefit is that this paradigm models lane locations in a row-based manner, which gives us the opportunity to explicitly establish relationships between different rows. The original semantic gap caused by low-level pixelated modeling and high-level lane long-line structure can be bridged.

loss of lane structure

In addition to the classification loss, we also propose two loss functions for modeling the positional relationship of lane points. This facilitates the learning of structural information.

The first one is due to the fact that lanes are continuous, i.e. lane points in adjacent row anchors should be close to each other. In our formulation, the position of the lane is represented by a classification vector. Therefore, continuity is achieved by constraining the distribution of classification vectors over adjacent row anchors. In this way, the similarity loss function is:
L sim = ∑ i = 1 C ∑ j = 1 h − 1 ∥ P i , j , : − P i , j + 1 , : ∥ 1 , L_{sim}=\sum^ C_{i=1}\sum^{h-1}_{j=1}\Vert P_{i,j,:}-P_{i,j+1,:}\Vert_1,Lsim=i=1Cj=1h1Pi,j,:Pi,j+1,:1,
of which,P i , j , : P_{i,j,:}Pi,j,:is jjThe prediction of j row anchors,∥ ⋅ ∥ 1 \Vert\cdot\Vert_11L 1 L_1L1norm.

Another structural loss function focuses on the shape of the lanes. Generally, most lanes are straight. Even the curves are mostly still straight due to the perspective effect. In this work, we use a second-order difference equation to constrain the shape of the lane, which is zero in the straight case.

To account for shape, the lane position on each row anchor needs to be computed. The intuitive idea is to obtain the position from the classification prediction by finding the peak of the largest response. For any lane index iii and the row anchor indexjjj,使用LOC i , j L_{OCi,j}LOC i , jCan be expressed as:
LOC i , j = argmaxk P i , j , k , s . t . k ∈ [ 1 , w ] L_{OCi,j}=\mathop{argmax}\limits_{k}P_{i,j ,k},\ st\ k\in[1,w]LOC i , j=kargmaxPi,j,k, s.t. k[1,w ]
wherekkk is an integer representing the position index. Note that we are not counting in background grid cells, and the position indexkkThe value range of k is only 1 11 towww instead ofw + 1 w + 1w+1

However, argmax argmaxThe a r g max function is not differentiable and cannot be used with further constraints . In addition, in the classification formulation, the classes have no obvious order, making it difficult to establish the relationship between different row anchors. To address this, we propose to use the predicted expectation as an approximation of the location. We usesoftmax softmaxso f t max x function to get the probability of different positions:
P robi , j , : = softmax ( P i , j , 1 : w ) , Prob_{i,j,:}=softmax(P_{i,j,1: w}),Probi,j,:=softmax(Pi,j,1:w) ,
whereP i , j , 1 : w P_{i,j,1:w}Pi,j,1:wyes www- dimensional vector,Probi , j , : Prob_{i,j,:}Probi,j,:represents the probability at each location. For the same reason as Eq. 4, background grid cells are not included and the calculation range is only 1 11 towww , then the position expectation is:
LOCI , J = ∑ k = 1 wk ⋅ P robi , j , k L_{OCI,J}=\sum^w_{k=1}k\cdot Prob_{i,j,k}LOC I , J=k=1wkProbi,j,k
Among them , Probi , j , k Prob_{i,j,k}Probi,j,kright iii lane, thejjthThe j-th row anchor and thekkthThe probability of k positions. The benefits of this targeting method are twofold. The first is that the expectation function is differentiable. The second is that the operation recovers continuous positions with discrete random variables.

According to the above formula, the second-order difference constraint can be written as:
L shp = ∑ i = 1 C ∑ j = 1 h − 1 ∥ ( LOC i , j − LOC i , j + 1 ) − ( LOC i , j + 1 − LOC i , j + 2 ) ∥ 1 , L_{shp}=\sum^C_{i=1}\sum^{h-1}_{j=1}\Vert(L_{OCi,j}-L_{OCi ,j+1})-(L_{OCi,j+1}-L_{OCi,j+2})\Vert_1,Lshp=i=1Cj=1h1(LOC i , jLOC i , j + 1)(LOC i , j + 1LOC i , j + 2)1,
among them,jjj forsecondThe position on the i lane, that is, the jjthAnchor point for row j . The reason we use second-order differences instead of first-order differences is that first-order differences are non-zero in most cases. Therefore, the network needs additional parameters to learn the distribution of the first difference of lane positions. In addition, the constraint of the second-order difference is relatively weaker than that of the first-order difference, so when the lane is not straight, the influence of the second-order difference is small. Finally, the overall structural loss can be:
L str = L sim + λ L shp L_{str}=L_{sim}+\lambda L_{shp}Lstr=Lsim+λLshp
where λ is the loss factor.

feature aggregation

The above loss design mainly focuses on the internal relationship between lanes. In this section, we propose an auxiliary feature aggregation method that is performed on both global context and local features. An auxiliary segmentation task is proposed to model local features using multi-scale features. We use cross-entropy as an auxiliary segmentation loss. In this way, the overall loss of our method can be written as:
L total = L cls + α L str + β L seg , L_{total}=L_{cls}+\alpha L_{str}+\beta L_{seg},Ltotal=Lcls+αLstr+βLsee g,
whereL seg L_{seg}Lsee gis the segmentation loss, α αab bβ is the loss coefficient. The overall architecture is shown in Figure 4.

It should be noted that our method only uses the auxiliary segmentation task during the training phase, which will be removed during the testing phase. In this way, even if we add additional segmentation tasks, it will not affect the running speed of our method. It is the same as the network without the auxiliary segmentation task.

insert image description here

UFLDv2

On the basis of UFLD, column anchors and row anchors are added to form hybrid anchors, based on which the lane detection task is reformulated as an ordered classification problem to obtain lane coordinates.

We extend the lane representation with row anchors to a hybrid anchor system. Our observation is that row anchor systems may not be suitable for all types of lanes and may cause amplified localization problems. As shown in Figures 2a and 2b, when row anchors are used, the localization accuracy of the side lane is significantly lower than that of the self lane. What if we use column anchors? In Figure 2c, we can see the opposite phenomenon, where the column anchor system is less able to localize the ego channel. We refer to this phenomenon as the amplified localization error problem. This problem makes it difficult for row anchors to locate horizontal (side lane) lanes, and it also makes it difficult for column anchors to locate vertical lanes (ego lanes). Based on the above observations, we propose to use mixed (row and column) anchors to represent different lanes respectively. Specifically, we use row anchors for the self channel and column anchors for the side channel. This can alleviate the problem of positioning error amplification and improve performance.

insert image description here

In a hybrid anchor system, lanes can be represented by coordinates on the anchor system. How to effectively learn these coordinates is another important issue. The most straightforward way is to use regression. In general, regression methods are suitable for local-scale prediction settings and are relatively weak in modeling long-range and global localization . To cope with the global distance prediction, we propose a classification-based lane coordinate learning method , which uses different classes to represent different coordinates. In this work, we further extend the original classification to ordered classification . In ordered classification, adjacent classes have a close ordered relationship , which is different from the original classification. For example, in the ImageNet classification task, class 7 is stingray (a type of fish) and class 8 is rooster. In our work, the classes are ordered (eg, the lane coordinates of class 8 are always spatially to the right of class 7). Another property of ordered classification is that the space of classes is continuous . For example, a non-integer class like class 7.5 makes sense and can be seen as an intermediate class between classes 7 and 8. To achieve ordered classification, we propose two loss functions to jointly model the ordered relationship between classes, including basic classification loss and mathematical expectation loss. Using the order relation and the continuous class space property, we can replace argmax with mathematical expectation to get the predicted continuous class [19]. The expected loss is the continuous class that constrains the predictions to be equal to the ground truth. By simultaneously constraining the base loss and the expected loss, the output has a better ordered relationship, which is beneficial for lane localization.

insert image description here

Anchor-based lane representation

To represent lanes, we introduce row anchors for lane detection, as shown in Figure 3. Lanes are represented by dots on row anchors. However, row-anchor systems may cause amplified localization error problems, as shown in Figure 2. In this way, we further extend the row anchor system to a hybrid anchor system.

The reason for this problem is shown in Figure 5. Assuming that there is no anchor point system, the ideal minimum positioning error is ε εε , which may be caused by network bias, labeling error, etc. We can see that the error band of the row anchor system is multiplied by1 sin θ \frac{1}{sin θ}sinθ1. When the angle θ between the channel and the anchor point θWhen θ is small, the amplification factor is1 sin θ \frac{1}{sin θ}sinθ1tends to infinity. For example, when a lane is strictly horizontal, it is not possible to represent that lane with a row anchor system. This problem makes it difficult for row anchors to locate more horizontal lanes (usually side lanes), and likewise it makes it difficult for column anchors to locate more vertical lanes (usually ego lanes). Conversely, when the lane is perpendicular to the anchor point, the error introduced by the anchor point system is minimal ( θ θθ = 0), which is equal to the ideal positioning errorε εeh .

insert image description here

Based on the above observations, we further propose to use hybrid anchors to represent lanes. For different lane types, different positioning systems are used to reduce the amplified positioning error. Specifically, the rule is: only one kind of anchor can be assigned to a lane, and the more vertical anchor type is chosen for that lane. In practical applications, lane detection datasets such as CULane [2] and TuSimple [53] only label two ego lanes and two side lanes, as shown in Figure 2a. In this way, we use row anchors on ego lanes and column anchors on side lanes, and the hybrid anchor system can alleviate the amplified localization error problem.

In the hybrid anchor system, we can represent the lane as a series of coordinates on the anchor, as shown in Figure 4. N row N_{row}NrowIndicates the number of row anchors, N col N{col}N co l represents the number of column anchors. For each lane, we first assign the corresponding anchor system with the smallest localization error. Then calculate the straight line intersection of the lane and each anchor point, and record the coordinates of the intersection. If the lane does not intersect between some anchors, the coordinates will be set to -1. Suppose the number of lanes assigned to row anchors isN laner N^r_{lane}Nlaner, the number of lanes assigned to the column anchor is N lanec N^c_{lane}Nlanec. The lanes in the image can be represented by a fixed-size target TTT , where each element is either the coordinates of the lane or -1, and its length isN row × N laner + N col × N lanec N_{row} × N^r_{lane} + N_{col} × N^c_{lane}Nrow×Nlaner+Ncol×Nlanec T T T can be divided intoT r T^rTr andT c T^cTTwo parts of c , corresponding to the part on the row anchor and the column anchor, respectively, the size isN row × N laner N_{row} × N^r_{lane}Nrow×NlanerN col × N lanec N_{col} × N^c_{lane}Ncol×Nlanec

Anchor Driven Network Design

With lane representations of mixed anchors, we design the network with the goal of learning a fixed-size target T r T_rTrand T c T_cTcand classify. To learn T r T_r with classificationTrand T c T_cTc, we will T r T_rTrand T c T_cTcDifferent coordinates in map to different classes. Suppose T r T_rTrand T c T_cTcis normalized ( T r T_rTrand T c T_cTcThe elements range from 0 to 1 or equal to 1, that is, the "no lane" case), the number of classes is N dimr N^r_{dim}Ndimrand N dimc N^c_{dim}Ndimc。映射可以写成:
{ T c l s i , j r = ⌊ T i , j r N d i m r ⌋ , T c l s m , n c = ⌊ T m , n c N d i m c ⌋ , s . t . i ∈ { 1 , ⋯   , N r o w } , j ∈ { 1 , ⋯   , N l a n e r } , m ∈ { 1 , ⋯   , N c o l } , n ∈ { 1 , ⋯   , N l a n e c } , \begin{aligned} &\begin{cases} T^r_{cls_i,j}=\lfloor T^r_{i,j}N^r_{dim}\rfloor, \\ T^c_{cls_m,n}=\lfloor T^c_{m,n}N^c_{dim}\rfloor, \end{cases} \\ s.t.\quad &i\in\{1,\cdots,N_{row}\},j\in\{1,\cdots,N^r_{lane}\},\\ &m\in\{1,\cdots,N_{col}\},n\in\{1,\cdots,N^c_{lane}\}, \end{aligned} s.t.{ Tclsi,jr=Ti,jrNdimr,Tclsm,nc=Tm,ncNdimc,i{ 1,,Nrow},j{ 1,,Nlaner},m{ 1,,Ncol},n{ 1,,Nlanec},
where T clsr T^r_{cls}Tclsr T c l s c T^c_{cls} Tclscis the mapping class label of coordinates, ⌊ ⋅ ⌋ \lfloor\cdot\rfloor is rounding operation,T cls _ i , jr T^r_{cls\_i,j}Tcls_i,jr T c l s r T^r_{cls} Tclsrtarget iiline i , linejjelements of column j . In this way, we can convert the learning of coordinates on the mixed anchor point into two dimensionsN dimr N^r_{dim}Ndimrand N dimc N^c_{dim}Ndimcclassification problem. For the case of no lane, ie T i , jr T^r_{i,j}Ti,jror T m , nc T^c_{m,n}Tm,ncequal to -1, we use an additional two-way classification to represent:
T ext _ i , jr = { 1 , if T i , jr ≠ − 1 0 , otherwise , s . t . i ∈ { 1 , ⋯ , N row } , j ∈ { 1 , … , N laner } \begin{align} &T^r_{ext\_i,j}= \begin{cases} 1,\ if T^r_{i,j}\neq -1\\ 0,\ otherwise \end{cases}\ ,\\ &s.t.\quad i\in\{1,\cdots,N_{row}\},j\in\{1,\dots,N^r_{ lane}\}\end{align}Text_i,jr={ 1, ifTi,jr=10, otherwise ,s.t.i{ 1,,Nrow},j{ 1,,Nlaner}
where T extr T^r_{ext}Textris the class label of coordinates, T ext _ i , jr T^r_{ext\_i,j}Text_i,jr T e x t r T^r_{ext} Textrii _row i , jjelements of column j . column anchorT extc T^c_{ext}Textc的存在目标类似:
T e x t _ m , n c = { 1 ,   i f T m , n c ≠ − 1 0 ,   o t h e r w i s e   , s . t . m ∈ { 1 , ⋯   , N c o l } , n ∈ { 1 , … , N l a n e c } . \begin{align} &T^c_{ext\_m,n}= \begin{cases} 1,\ if T^c_{m,n}\neq -1\\ 0,\ otherwise \end{cases}\ ,\\ &s.t.\quad m\in\{1,\cdots,N_{col}\},n\in\{1,\dots,N^c_{lane}\}. \end{align} Text_m,nc={ 1, ifTm,nc=10, otherwise ,s.t.m{ 1,,Ncol},n{ 1,,Nlanec}.
According to the above derivation, the entire network needs to learn T clsr T^r_{cls}Tclsr T c l s c T^c_{cls} Tclsc T e x t r T^r_{ext} Textr T e x t c T^c_{ext} Textc, which includes both the positioning and existence branches. Suppose the deep feature of the input image is XXX , then the network can be expressed as:
P , E = f ( flatten ( X ) ) , P,E=f(flatten(X)),P,E=f ( f l a tt e n ( X )) ,
where,PPPwa EE_E are the location and existence branches respectively,fff is the classifier,flatten ( ⋅ ) flatten(·)f l a t e s ( ) is the flattening operation. PPPwa EE_The output of E consists of two parts (P r P^rPr P c P^c Pc E r E^r Er andE c E^cEc ) composition, corresponding to the row and column anchors, respectively. P r P^rPr andP c P^cPThe dimensions of c areN laner × N row × N dimr N^r_{lane} × N_{row} × N^r_{dim}Nlaner×Nrow×Ndimr N l a n e c × N c o l × N d i m c N^c_{lane} × N_{col} × N^c_{dim} Nlanec×Ncol×Ndimc, where N dimr N^r_{dim}Ndimrand N dimc N^c_{dim}Ndimcis the mapped categorical dimension for row and column anchors. E r E^rEr andE c E^cEThe dimensions of c areN laner × N row × 2 N^r_{lane}×N_{row}×2Nlaner×Nrow×2N lanec × N col × 2 N^c_{lane}×N_{col}×2Nlanec×Ncol×2

In the above formula, we directly flatten the deep features of the backbone network and feed them into the classifier. In contrast, traditional classification networks use global average pooling (GAP). We use the flatten operation instead of GAP because we found that for classification-based lane detection networks, spatial information is very important. Using GAP eliminates spatial information, resulting in poor performance.

Ordinal classification loss

As we saw in Equation 1, an important property of the classification network described above is that there is an ordered relationship between classes. In our classification network, adjacent categories are defined as having close and ordered relationships, which is different from traditional classification. To better utilize the prior knowledge of ordered relations, we propose to use base classification loss and expectation loss.

基分类损失定义为:
L c l s = ∑ i = 1 N l a n e r ∑ j = 1 N r o w L C E ( P i , j r , o n e h o t ( T c l s _ i , j r ) ) + ∑ m = 1 N l a n e c ∑ n = 1 N c o l L C E ( P m , n c , o n e h o t ( T c l s _ m , n c ) ) , \begin{align} L_{cls}&=\sum^{N^r_{lane}}_{i=1}\sum^{N_{row}}_{j=1}L_{CE}(P^r_{i,j},onehot(T^r_{cls\_i,j}))\\ &+\sum^{N^c_{lane}}_{m=1}\sum^{N_{col}}_{n=1}L_{CE}(P^c_{m,n},onehot(T^c_{cls\_m,n})), \end{align} Lcls=i=1Nlanerj=1NrowLCE(Pi,jr,onehot(Tcls_i,jr))+m=1Nlanecn=1NcolLCE(Pm,nc,onehot(Tcls_m,nc)),
份位LCE ( ⋅ ) L_{CE}(·)LCE() is the cross entropy loss,P i , jr P^r_{i,j}Pi,jris the iith assigned to the row anchorThe prediction of the i lane, corresponding to thejjthj row anchors,T cls _ i , jr T^r_{cls\_i,j}Tcls_i,jris P i , jr P^r_{i,j}Pi,jrcorresponding category labels. P m , nc P^c_{m,n}Pm,ncis the mmth assigned to the column anchorPredictions for m lanes, corresponding to nnthlanesn column anchors,T cls _ m , nc T^c_{cls\_m,n}Tcls_m,ncis P m , nc P^c_{m,n}Pm,ncThe corresponding classification labels for , onehot ( ⋅ ) onehot(·)onehot() is a one-hot encoding function.

Since the categories are ordered, the predicted expectation can be viewed as the average voting result. For convenience, we express expectation as:
{ E xpi , jr = ∑ k = 1 N dimr P robi , jr [ k ] ⋅ k , E xpm , nc = ∑ l = 1 N dimc P robm , nc [ l ] ⋅ l , \begin{cases} Exp^r_{i,j}=\sum^{N^r_{dim}}_{k=1}Prob^r_{i,j}[k]\cdot k,\ \ Exp^c_{m,n}=\sum^{N^c_{dim}}_{l=1}Prob^c_{m,n}[l]\cdot l, \end{cases}{ Expi,jr=k=1NdimrProbi,jr[k]k,Expm,nc=l=1NdimcProbm,nc[l]l,

Where [·] represents the indexing operator. The problem is defined as:
{ P robi , jr = softmax ( P i , jr ) , P robm , nc = softmax ( P m , nc ) , \begin{cases} Prob^r_{i,j}=softmax(P^r_ {i,j}),\\ Prob^c_{m,n}=softmax(P^c_{m,n}), \end{cases}{ Probi,jr=softmax(Pi,jr),Probm,nc=softmax(Pm,nc),
In this way, we can constrain the forecasted expectations to be close to the ground truth. Therefore, we have the following expected loss:
L exp = ∑ i = 1 N laner ∑ j = 1 N row L 1 ( E xpi , jr , T cls _ i , jr ) + ∑ m = 1 N lanec ∑ n = 1 N col L 1 ( E xpm , nc , T cls _ m , nc ) \begin{align} L_{exp}&=\sum^{N^r_{lane}}_{i=1}\sum^{N_{ row}}_{j=1}L_1(Exp^r_{i,j},T^r_{cls\_i,j})\\ &+\sum^{N^c_{lane}}_{m= 1}\sum^{N_{col}}_{n=1}L_1(Exp^c_{m,n},T^c_{cls\_m,n}) \end{align}Lexp=i=1Nlanerj=1NrowL1(Expi,jr,Tcls_i,jr)+m=1Nlanecn=1NcolL1(Expm,nc,Tcls_m,nc)
where L 1 ( ⋅ ) L1(·)L 1 ( ) is smoothL 1 L1L 1 loss.

A graphical representation of the expected loss is shown in Figure 7. We can see that the expectation loss can facilitate lane localization by pushing the mathematical expectation of the predicted distribution towards the position of the ground truth.

insert image description here

In addition, the loss Lext of the existence branch is defined as follows:
L ext = ∑ i = 1 N laner ∑ j = 1 N row LCE ( E i , jr , onehot ( T ext _ i , jc ) ) + ∑ m = 1 N lanec ∑ n = 1 N col LCE ( E m , nc , onehot ( T ext _ m , nc ) ) . \begin{align} L_{ext}&=\sum^{N^r_{lane}}_{i= 1}\sum^{N_{row}}_{j=1}L_{CE}(E^r_{i,j},onehot(T^c_{ext\_i,j}))\\ &+\ sum^{N^c_{lane}}_{m=1}\sum^{N_{col}}_{n=1}L_{CE}(E^c_{m,n},onehot(T^c_ {ext\_m,n})).\end{align}Lext=i=1Nlanerj=1NrowLCE(Ei,jr,onehot(Text_i,jc))+m=1Nlanecn=1NcolLCE(Em,nc,onehot(Text_m,nc)).
Finally, the total loss can be expressed as:
L = L cls + α L exp + β L ext , L=L_{cls}+\alpha L_{exp}+\beta L_{ext},L=Lcls+αLexp+βLext,
where α and β are loss coefficients.

network reasoning

In this section, we show how to obtain detection results during inference. Taking the row anchor system as an example, suppose P i , jr P^r_{i,j}Pi,jrSum E i , jr E^r_{i,j}Ei,jrare the prediction results of the i-th lane and the j-th anchor point, respectively. Then P i , jr P^r_{i,j}Pi,jrSum E i , jr E^r_{i,j}Ei,jrThe lengths are N dimr N^r_{dim}Ndimrand 2. The probability of each lane position can be expressed as:
P robi , jr = softmax ( P i , jr ) , Prob^r_{i,j}=softmax(P^r_{i,j}),Probi,jr=softmax(Pi,jr) ,
where the length of Probri,j is N rdim. Then, by taking the mathematical expectation of the predicted distribution, the position of the lane can be obtained. Furthermore, predictions for non-existent lanes are filtered out based on predictions from the presence branch.
L oci , jr = { ∑ k = 1 N dimr P robi , jr [ k ] ⋅ k , if E i , jr [ 2 ] > E i , jr [ 1 ] − 1 , otherwise , Loc^r_{i,j }= \begin{cases} \sum^{N^r_{dim}}_{k=1}Prob^r_{i,j}[k]\cdot k,\quad if\quad E^r_{i, j}[2]>E^r_{i,j}[1]\\ -1,\qquad\qquad\qquad\qquad otherwise \end{cases}\ ,Loci,jr={ k=1NdimrProbi,jr[k]k,ifEi,jr[2]>Ei,jr[1]1,otherwise ,
and finally, the obtained location Loc is scaled to fit the size of the input image. The overall schematic diagram of the network architecture is shown in Figure 6.

insert image description here

Guess you like

Origin blog.csdn.net/qq_37214693/article/details/132142049