The first job to read "mathematical beauty" Thoughts + code specifications

                            Read "mathematical beauty" felt

  At first the teacher let us look at "mathematical beauty" This is a book I did not understand, because I want to and not a language lesson, why write a book review, but not the math to see why the mathematical beauty, but after reading, I only to find this book really useful.

  In fact, I have not read a few chapters, but speaking in front of a statistical language model not only caused me a lot of interest, but also gave me great inspiration. Mentioned in the book, if you want to know the probability of a sequence of text appears in the S, is the probability of occurrence of each word in the sequence is multiplied as P (S) = P (w1) P (w2 | w1) P (w3 | w1 w2) ...... P (wn | w1 w2 ..... wn-1), where P (w2 | w1) are known in the case where a word appears first, second word probability of occurrence. But if you want to calculate the probability of occurrence of a word to the previous n-1 words are related, calculate the amount is too large and therefore difficult to calculate high, with Markov assumption that the probability of any word that appears only in the same wi it in front of the word wi-1 related. Formula can be simplified as follows: P (S) = P (w1) P (w2 | w1) P (w3 | w2) ..... P (wi | wi-1).

  This reminds me of my freshman statistical machine translation system, after each machine translation system Bahrain, we all need to be tested in a laboratory filled with Moses BLEU value. BLEU value is used to determine the degree of similarity of the two sentences, give a simple chestnut: two sentences S1 = I learn C ++; S2: I learn Java; similarity of these two sentences is 2/3 molecule is a translation candidate word number appears in the reference translation (whether or not in the same sentence reference translation), the denominator is the number of words the candidate translations. Why not say whether the reference in the same sentence translation, because BLEU is a machine translation of words corresponding to several reference translations are compared to calculate a composite score and therefore are not compared with the sentence, but with more than translation compared to the reference period. In order to avoid interference common words, we also used a number of times to improve the accuracy of the comparison of multi-word appears in a sentence each reference translation, the results will be compared to one another to the maximum to obtain the final BLEU.

  In addition to also use a statistical model to solve the problem of ambiguous word of Chinese, using statistical language model probability of each sentence after the word appeared calculated to find out where the greatest probability is the best word method. This let me think, Moses in the process of installation in Corpus Preparation.

Tokenize: insert a space between words and punctuation.
Truecasing: each of the words in a sentence are the most likely to be converted to a prototype, which helps reduce the sparsity of the data.
 cleaning: long sentences and empty statement can cause problems during training, so remove, delete significantly misaligned sentence deleted.

In the pretreatment corpus, the need for first Chinese word corpus, parallel aligned corpus after facilitate the use GIZA ++.

 I believe that helping statistical language model "mathematical beauty" in reference to statistical machine translation is great.

   Slowly read on "mathematical beauty" I have found to be able to learn a lot, mathematics and computer still inseparable, many algorithms, training models related to mathematics, I will continue to look after the "mathematical beauty," I believe there will be a deeper understanding. 

In C ++ code specification as the standard:

Reference: (1 news) 11 the most comprehensive C / C ++ coding standard summary - https://blog.csdn.net/p942005405/article/details/80282572 CSDN blog - oada's blog

First, the document typesetting


1. include the header file

 • First system header files, the user header files.
 • system headers, stable directory structure should be used to contain sub-way path.
 • Custom header file, directory structure is unstable, it should specify the path included in the dsp.
 • Application system header files: #include <xxx.h>
 • files with custom application: #include "xxx.h"
 • reference only header files needed.

2. h and cpp file

 • header file named .h, inline file named .inl; C ++ .cpp files named *
 • file names with mixed case, lowercase or mixed. For example DiyMainview.cpp, infoview.cpp. Do not use meaningless names: for instance XImage.cpp; SView.cpp; xlog.cpp;
 • header file in addition to special circumstances, should be used #ifdefto control the block.
 • The header #endifshould line comments.
 • the first document, the first code block comprising, followed by macro definition code block, then the global variables, global constants, type definitions, class definitions, inline portion.
 • CPP file containing instructions, macro definitions, global variables, function definitions.

3. File Structure

 • The file should contain the file header comments and content.
 Two blank lines, can be a blank line or not required special cases between • a function body based on the principle thereof.

4. blank line

 • header control block, #includepart, macro definition portion, classbetween the portions, portions global constants, global variables portion, functions and function, with the two blank lines.

Second, Annotation


1. File header comment

 • author, file name, file description, creation date (optional)

2. Function Comment

 • Key function must write a note about the use of the function.
 • special function parameters, a description of the purpose of argument, who is responsible for the release and so on.
 • In addition to special circumstances, comments written before the code, do not put after line of code.
 • For each #elseor #endifcomment to the end of the trip.
 • Key code comments, including but not limited to: assignment, function call, expression, branches and so on.
 • Good unrealized complete code, or the need to further optimize the code should be added TODO ... //
 • debug the code, add comments only for DEBUG //
 • codes need to attract attention, with comments NOTE ... //
 • for the end of the larger block, such as for,while,doand the like, can be added // end for | while | do

Third, naming


1. Principles

 • Identity: writing a sub-module or a derived class, to follow the naming style of its base class or a whole module, keep the naming style identity throughout the module.
 • identifiers: an identifier using English words or combinations thereof, it should be intuitive and spelling expected EENOW text, terms are to be accurate, to avoid naming with the alphabet.
 • maximize the amount of information the principle of minimizing the length &&: identifier while maintaining a clear meaning, we should try to shorten its length.
 • Avoid too similar: Do not appear similar to the case sensitivity of identifiers alone, for example "i"与"I", "function"and "Function"so on.
 • Avoid scopes at different levels of the same name: the name of the program do not appear exactly the same as local and global variables, although without grammatical errors between the two different scopes, but misleading.
 • correctly named identifier has exclusive significance: named identifier has exclusive significance with the correct antonym groups, such as: "nMinValue" and "nMaxValue", "GetName()" and "SetName()" . ...
 • Avoid numbered names appear: Try to avoid names appear numbered, as Value1 , Value2, etc., unless the logic does need numbers. This is to prevent programmers lazy, unwilling to name their brains and lead to a meaningless name (with numbered because the most easy).

2. T, C, M, R class

 • T represents a class of simple data types, does not have control over resources, the resources are not released during operation in its destructor.
 • C represents a class that inherits from CBase. This class can not define variables from the stack, can only be created from the heap.
 • M represents the interface class.
 • R is a resource, usually inherent in the system type. In special cases, R type should not appear in the code development.

3. function name

 The function name • M class should be HandleXXXnamed, for example: HandleTimerEvent;not recommended java style, for example, handleTimerEvent;in addition to the standard c-style code that is not recommended by underscores, for example handle_event.
 • Leave function, with the suffix L.
 • Leave function, and into the cleanup stack, with the suffix LC.
 • Leave function, and delete objects, with the suffix LD.

4. Function Parameters

 • function parameters used as a prefix.
 • Avoid naming and Hungary appear as mixed apBuffername. Use aBuffercan be.
 • function parameters relatively long time, replaced by a structure should be considered.
 • If you can not avoid more arguments, should be considered for each parameter occupies one line on the layout, parameter names vertically aligned.

5. member variables

 • most member variables prefixed with m.
 • avoid mixing Hungary and naming as mpBuffer name. With mBuffer can be.

6. Local variables

 • loop variable and a simple variable with a simple string to lowercase. For example, int i;
 • pointer variables pbegin with, for example,void* pBuffer;

7. Global Variables

 • global variables with g_most prefix.

8. class name

 • class and object names should be nouns.
 • implement behavior of a class member function name should be a verb.
 Access and query function name • class member should be nouns or adjectives.

9. Style Compatibility

 • For transplant or open source code, you can follow the original style, not C ++ naming conventions.

Fourth, the coding style aspects

1. Tab and space

 • at the beginning of each line indent can only use Tab, not by a space, and then typing unified by a space. In addition to the beginning of the control indentation Tab, to align other parts, required spaces for indented. This avoids misalignment of the display case at a different editor.
 • extra spaces can not appear at the end of a line of code.
 • Do not "::","->","."add spaces around.
 • Do not ",",";"add space before.

2. The type definitions and {

 • classes, structures, enumerations, unions: braces separate line

3. Functions

 • {function body needs a new line, you can not have indented before {.
 • In addition to special circumstances, the body can not function two blank lines.
 • In addition to special circumstances, the body can not function macro definition instruction.
 • without blank lines between closely related in function on a body, bowed logic statements, other places should add a blank line separator.
 • defined in the header file inlinecan not have blank lines between functions, functions, recommended by a blank line.

4. The code block

 •  "if"、"for"、"while"、"do"、"try"、"catch" other statements line by itself, the statement is executed not followed. Regardless of how much will be added to execute the statement "{}." This prevents mistakes when you write and modify the code.
 •  "if"、"for"、"while"、"do"、"try"、"catch" The expressions in parentheses and brackets can be next to the keyword, such expression is emphasized.

5. else

• if statements if else statements} else {write with one line, three lines of code is not recommended way.

6. lines of code

 • line of code do one thing, just as the definition of a variable, or write only one statement. This code is easy to read and easy to write a comment.
 • multi-line variable definitions, the code layout in pursuit of beauty, a variable may be aligned vertically.
 • the maximum line length should be controlled within a certain character, can all be seen within the current screen is appropriate.

7. switch statement

 • case and switch keyword should be aligned.
 • case statement if the child has a variable, use {} contains them.
 • If there is a similar statement juxtaposed simple case, consider the case of code blocks write a line of code.
 • can not blank line between a simple case, consideration should be given separated by empty lines between the complex case.
 • case word statement braces on a separate line, and do not row case wrote.
 • Provide default for all branches of the switch statement.
 • If a case does not need to break the comment must be added the statement.

8. cycle

 • air circulation available  for( ;; ) or  while( 1 ) or while( true )

Five types


• The definition of a pointer and reference types followed by * and &.
• Try to avoid using floating-point numbers, unless necessary.
• use typedefcomplex syntax to simplify the program.
• Avoid defining unnamed type. For example: typedef enum { EIdle, EActive } TState;
• less union, if we use is a simple data type members.
• Use enumreplace (a group of related) constant.
• Do not use the magic number.
• Try to replace with a reference pointer.
• Define the variable initialization is completed immediately, do not wait until you were using.
• If there is a more elegant solution, do not use casts.

Sixth, the expression


• Avoid using an assignment statement in the expression.
• Avoid doing floating-point type is equal or not equal judgment.
• You can not perform operations and then assigned to an enumerated type enumeration.
• Do not modify the loop counter during cycling.
• null pointer is detected by  if( p )
detecting a non-null pointer • with if( ! p )

Java code to regulate an example:

Reference: Interpretation Ali official code standards - in particular on the tassel - blog Park https://www.cnblogs.com/renyuanwei/p/9169452.html

Naming conventions:

1. The code name can not start with an underscore or a dollar sign, can not end with an underscore, or dollar sign

2. code named Do not use mixed Pinyin and English, but does not allow direct use Chinese way

3. UpperCameCase style class name (prefix uppercase character conversion method multiword fitted) must follow the form of a hump (certain exceptions, such as the model name related art, such as: DO / BO / DTO / VO);

4. The method name, parameter name, member variables, local variables are consistent use lowerCamelCase (camel fonts), must comply with the hump form

5. Constant name in all caps (easy to forget) , separated by an underscore between words

6. unified package name in lowercase, and only a semantic nature of the English word delimiters between points

 Nomenclature:

1. strive to complete semantic clear, not too long name

2. eliminate completely non-standard abbreviations, I do not know the text is intended to avoid looking

(Best not to use abbreviations)

Class name:

1. Abstract class name or the beginning of the use Abstract Base; exception class named using ends Expetion; test class named class name to test it start to end Test

2. For Service and DAO classes, based on the concept of SOA, must be exposed service interfaces, and the difference between the suffix internal interface implementation class with Impl

3. If the interface name to describe the ability to take corresponding adjective as interface name

4. enum class name is recommended to bring Enum suffix, famous names need to enumerate all uppercase, separated by an underscore between words

5. If the design patterns, the particular pattern reflected in the recommended class name

6. The unified package names singular forms; if the plural class name, class name may be used plural form

Constant Statute:

1. Any magic value (that is, without the constants defined) occurs not allowed to appear directly in the code

2. Do not use a constant maintenance of all class constants should be classified according to the constant function, separate maintenance

3. Constant multiplexing hierarchy has five: the constant cross-application sharing, application sharing within a constant, constant share within the sub-project, the bag constant sharing, shared constants within the class

4. If the variable value changes only Enum type used, if it is extended with attributes other than the name of the class must be used within a range Enum

5. Try not to define variables in the interface, if you have to define the variables, certainly related to the interface method, and is the basis for constant throughout the application

(Core: 1 Do not use constant; 2 to make constant control.)

grammar:

1. When the long or Long initial assignment, you must use a capital L, not lowercase l, lowercase easily confused with the digit 1, misleading

2. Interface class attributes and methods do not add any modification symbols (public and do not add), maintaining simplicity code, combined with effective and Javadoc comments

3. All the override method, you must add a comment @Override

4. The variable parameters must be placed in the parameter list of the last

5.final lesson and improving the efficiency of the program response

Format constraints:

1. indented using four spaces, prohibit the use of tab characters

2. The one-way restrictor characters not more than 120

3. the IDE to the text file encoding UTF-8

Line breaks in the IDE using the Unix file format, do not use windows format

The method is performed in vivo group of statements, statements inserted between the set of variables is defined, between the different semantics of different business logic or a blank line

Notes Statute:

1. All fields must be annotated enumerated type, each data item is described the use of

2. If the English interpretation of the comment is not clear, the problem with Chinese comments it clear. Proper names and keywords can keep the original English text

3. Comment out the code as much as possible to tie explain, rather than simply comment out (unless otherwise recommended a very short time to restore deleted directly)

4. good name, code structure is self-explanatory, comments to be simple and accurate expression in place

Code Style:

1. In a fast switch, each case terminated by either break / return, etc., or explanatory notes program execution will continue until the case where a

2. You must use braces if / else / for / while / do statement, even if only one line of code, avoid using the following form:

if (condition)statements;

3. Recommended minimize the use else, if-else manner can be rewritten as

if(condition){

...

return obj;

}

// then write the business logic code else

(Because else will only bring trouble large chunks of code indentation, but also reduce the readability of the code)

4. If you must use if () ... else if () ... else ... mode logic expression, [do] forced more than three layers, more than use state design pattern

5. In addition to commonly used methods (such as getXXX / isXX) endures, do not perform other complex conditional statements, the logic judgment result of a complex assignment to a Boolean variable names meaningful to improve readability

boolean existed = (file.open(fileName,"w")!=null)&&(...)||(...);

if (existed) {

...

}

Method name:

Service / DAO layer method naming convention:

1. Obtain a single object with a get method prefixed

2. Get multiple objects prefixed with a list of methods

3. Get statistic methods prefixed with count

4. Insert the method save (recommended) or insert prefixed

5. Delete to do with the prefix remove (recommended) or delete

6. A method of modifying the update prefixed with

I'm going to regulate these two as my code in the next study.

 

Guess you like

Origin www.cnblogs.com/snowlxy/p/11442965.html