Oct 21 -- Nov 11

1. paper Reading

A Neural Model for Generating Natural Language Summaries of Program Subroutines ICSE 2019

  1. Problem define:creat a sentence to decribing subroutines in a program.
    Sub problem: how to represent source code
  2. Big Background:source code summaries, NMT(Neural Machine Translation)
  3. Pros and cons of Previous Work :
    . 3.1 JavaDoc etc. Tools need to write in a certain format notes, generate HTML form of documentation is essentially programmar writing the Summary..
    3.2 based on content selection and sentence templates, need people to design rules, the effect is good. bad depends on the person's design.
    3.3. Inspired by NMT, source code to summary also seq2seq, except that the code token named have much impact on code behavior will not, and semantic code usually structure and control flow and data flow related, in a similar seq2seq model. codenn , the SBT (Structure-based Traversal) , as well as different ICLR2019 of Funcom herein primarily expressed in different code.
  4. Point Improved (S) :
    4.1 and when its corresponding code AST are acquired using two kinds of input by the code Stream and AST.
    4.2 is the case of "no code words", the use of AST do code summaries.
  5. Methods Our :
    input code stream AST and two portions, with attention mechanisms employed seq2seq model are calculated Attention. Codenn , and the SBT ( slightly different.
    Here Insert Picture Description

2. terms of understanding

internal documentation, code documentation,

  1. Documentation
    Paper is mentioned in the internal documentation for code summary of an impact. I first checked the difference between the next document and documentation,
    Documentation of Documents Provided the SET IS ON A Paper. It has More denotations Within last "document." Wiki
    Documentation IS Also at The Activity of Creating Documents. Link
  2. Documentation Internal
    Paper meant meaningful variable name;
    press said on wike, this is worth a meaningful variable name and useful comments
  3. code documentation and code summary these two areas?
    The corresponding documentation for code generation is code documentation;
    for the code to generate the corresponding natural language description is the code summary;
    so to say. code summary count in the code documentation in? When I was reading the paper always think I do not know right

3. Some paper-based code summary NMT's

  1. codenn (ACL2016)
  2. SBT (ICPC2018)
  3. The paper FunCom (ICSE2019)
  4. code2seq(ICML2019)

This four articles are used in NLP seq2seq model to solve the problem of code summary, except that the representation of the code.

  • codenn token stream, one-hot encoded using, meaningful variable name and the wors Key IF, for, the while showing called dot. may reflect the structure of the code
  • SBT AST combined with code structure,
  • Both the herein FunCo input into connection with the previous development of AST and token stream seq
  • The AST code2seq input is developed into a set of path

    3.1 codenn

Here Insert Picture Description
input c=c1c2c3...是token stream
output n= n1n2... End是a sentence

3.2 SBT

It is a way of traversing the AST, in fact, the AST tree format similar to the XML format of sequence.
Here Insert Picture Description
model
Here Insert Picture Description

FunCom 3.3 of this article

3.4 code2seq

input,a set of AST path,
AST path是个三元组<起点,路径,终点>
path是从一个non-terminal / terminal 到另一个 non-terminal / terminal 的path,paper认为每条path可以express一些fact。
下图p1 express !d 是while循坏的终止条件
p4,express 将d的值改为true,结合p1, d = true后,while循环终止。
Here Insert Picture Description

4. Question

  1. FunCom中对于AST和Code中相同的变量名进行embedding后的vector不是同一个?
    ans: 一个是两者的embedding空间不一样,另一个他将原来code中的变量的名字name改成了type_name
  2. 实验过程中的attengru(输入不含有AST)和ast-attentiongru(输入中既有code也有txt)的在统计意义上效果大致相同,但是对于个例,各有所长,这一点的原因是什么论文没有做解释。但这个现象表明有些AST对于Code summary有时有作用,又是作用不是很大,有时甚至有副作用。
  3. 对于AST或许用recursive NN得到vector作为输入中 code/Text那部分的initial state是否有一定合理性。
  4. AST将叶子节点去掉按SBT提的方法序列化,将其用GRU得到的output作为code/Text那部分initial state
    ans:我之前又提到理解一个项目的代码,是想看框架,函数之间的调用,后关注细节。 将AST的信息非叶子用起来便是先关注整体,后来关注所有的代码. 这样做AST这部分首先不存在OOV,能从这里获取主要信息,code中的OOV去问题对结果的影响便不是很大。

5. Next plan

  1. The model of the four articles will be run on a result obtained partial data set.
  2. You can now run through the code, understand the code input and output, as well as the framework and models associated with that part of the specific details is not clear, then I want to figure out how to get the data files generated code execution, you can achieve modifications point model.
  3. Compiler theory and abstract syntax tree parsing part, the next two weeks to complete.
  4. Machine Learning: School Courses + watermelon + Li Hang book of statistical machine learning can live for two months the first pass,

    curriculum

No classes Monday, Wednesday all day Sunday, Tuesday, Thursday afternoon to discuss.

Guess you like

Origin www.cnblogs.com/enshengshi/p/11795296.html