Github project addresses (partner) | address |
---|---|
Pair programming partner blog address | address |
Address operational requirements | address |
1.1 twinning process
1.2 PSP table
PSP2.1 | Personal Software Process Stages | Estimated time consuming (minutes) | The actual time-consuming (minutes) |
---|---|---|---|
· Planning | · Plan | 20 | 20 |
· Estimate | • Estimate how much time this task requires | 25 | 25 |
· Development | · Development | 890 | 1290 |
· Analysis | · Needs analysis (including learning new technologies) | 60 | 90 |
· Design Spec | Generate design documents | 30 | 30 |
· Design Review | · Design Review (and his colleagues reviewed the design documents) | 20 | 20 |
· Coding Standard | · Code specifications (development of appropriate norms for the current development) | 10 | 10 |
· Design | · Specific design | 30 | 60 |
· Coding | · Specific coding | 600 | 900 |
· Code Review | · Code Review | 60 | 60 |
· Test | · Test (self-test, modify the code, submit modifications) | 30 | 60 |
· Reporting | · Report | 30 | 30 |
· Test Report | · testing report | 30 | 30 |
· Size Measurement | · Computing workload | 20 | 20 |
· Postmortem & Process Improvement Plan | · Hindsight, and propose process improvement plan | 60 | 60 |
total | 1025 | 1415 |
2.1 ideas
Title divided into three stages, three times we start analyzing the various requirements of practice
- Basic functions
basic function is to count the number of characters , number of active lines , the total number of words , the number of types of valid words , frequency , output frequency of the top ten words in the specified order . How difficult and key words are statistically valid, stripping out a word from the text. To this end our approach is to use regular expressions to define the conditions for the regular expression, it is to be screened英文字母开头,长度大于等于4,但不可以是数字开头的字符串
- 扩展功能,命令行解析
扩展功能中要求实现一个命令行程序,像Linux
的Shell
命令一样有着一些参数选项。这一功能的难点在于命令行参数解析。为此,我们原打算通过判断Main
··的入口args
参数顺序以此比较来判断是否要进行某些功能。但是在实现过程中,发现题目要求命令行的参数有必填参数还有选填参数,参数的顺序还可以不固定。对此我们的方法就不再适用。通过请教同学,查阅资料,我们使用了NuGet
包CommandLineParse
工具来帮助我们实现命令行参数的解析工作。 - 扩展功能,窗体程序
在实现窗体程序前,我们把第二版的扩展功能的计算核心封装成DLL类库,在窗体程序中引用DLL服务,方便了程序的编写。
2.2 设计实现过程
我们设计了两个类,CalcCore类负责统计功能,包含5个功能函数,Options类负责解析命令行参数,函数与函数、类与类间没有关联关系
2.3程序结构图和流程图
程序结构图
命令行程序流程图
2.3 单元测试
单元测试中我们针对每个函数设计了两个测试样例
测试代码如图
测试txt文件如图
3.制定规范
Pascal——所有单词的第一个字母都大写;
一个通用的做法是:所有的类型/类/函数名都用Pascal形式,所有的变量都用。
类/类型/变量:名词或组合名词,如Member、ProductInfo等。例如单词数量取名CountOfWord
函数则用动词或动宾组合词来表示例如计算行数方法取名CalcLine
缩进设置Tab为4空格
在复杂条件表达式中使用括号表达优先级
花括号采用{}
各占一行的风格
在初始化变量时一定赋初值为默认
下划线在窗体程序中命名中采用
注释,对于计算核心的每个方法都注明方法的目的,参数,为什么这样做
错误处理,对于没有包含的操作,都要有配套的异常处理
4.代码互审
- 虽然制定了规范,但我仍有些习惯问题,比如FileStream,StreamReader对象我喜欢命名为fs,sr,这是不符合规范的,但是通常一个函数里只有一个FileStream,StreamReader对象,所以同伴没有强制改正
- 我和同伴都只习惯在给函数注释表示函数的作用,没有具体功能的注释,导致合并代码时总得询问对方的思路
- 同伴的功能函数包含了写入文件的功能,我认为函数功能应该单一,所以在整合代码时将写入文件的功能放在了主函数里
- 同伴的统计词频的排序功能写的太冗杂,在与同学交流后发现使用Linq的排序能大大减少代码量和降低开发难度
5.性能分析
我们发现程序中消耗最大的函数是统计词频函数
其中获得MatchCollection元素数量函数占比最大
于是我们修改了代码,减少了调用该函数的次数
老实说我没想通为什么调用两次与调用近三百万次的百分比居然相差不多
6.代码说明
- CalcChar 传入文件路径,读取所有字符,剔除中文字符,返回字符串长度
/// <summary>
/// 统计字符数
/// </summary>
/// <param name="path"></param>
/// <returns></returns>
public int CalcChar(string path)
{
int charNum;
string rest, str;
FileStream fs = new FileStream(path, FileMode.Open);
StreamReader sr = new StreamReader(fs);
str = sr.ReadToEnd();
string pattern = @"[\u4e00-\u9fa5]";
rest = Regex.Replace(str, pattern, "");
charNum = rest.Length;
sr.Close();
fs.Close();
Console.WriteLine("字符总数:" + charNum);
return charNum;
}
- CalcWords incoming file path, use regular expressions to get all eligible words, return the number of words
/// <summary>
/// 统计单词总数
/// </summary>
/// <param name="path"></param>
/// <returns></returns>
public int CalcWords(string path)
{
FileStream fileStream = new FileStream(path, FileMode.Open);
StreamReader streamReader = new StreamReader(fileStream);
string tool = @"\b[a-zA-z]{4,}\w{0,}";
string rest = streamReader.ReadToEnd();
MatchCollection mc = Regex.Matches(rest, tool);
int res = mc.Count;
Console.WriteLine("单词总数:" + res);
streamReader.Close();
fileStream.Close();
return res;
}
- CalcLine incoming file path, when the space-time behavior of the reading, not counting return a valid number of rows
/// <summary>
/// 计算文件中的行数
/// path为文件路径
/// </summary>
/// <param name="path"></param>
/// <returns></returns>
public int CalcLine(string path)
{
int res = 0;
FileStream fileStream = new FileStream(path, FileMode.Open);
StreamReader streamReader = new StreamReader(fileStream);
string Line = "";
while ((Line = streamReader.ReadLine()) != null)
{
if (Line.Length > 0)
res += 1;
}
streamReader.Close();
fileStream.Close();
Console.WriteLine("有效行数:" + res);
return res;
}
- CalcWordFrequence incoming file path and parameter n, use regular expressions to get the word to all eligible, into the dictionary, sorted by Linq, a new dictionary, the n key-value pairs into the new dictionary (such as n> key to a number, then all key-value pairs into the new dictionary), returns a new dictionary
/// <summary>
/// 统计单词词频
/// </summary>
/// <param name="path"></param>
/// <param name="n"></param>
public Dictionary<string, int> CalcWordFrequence(string path,int n)
{
string tool = @"\b[a-zA-z]{4,}\w{0,}";
Dictionary<string, int> keyValuePairWord = new Dictionary<string, int>();
FileStream fs = new FileStream(path, FileMode.Open);
StreamReader sr = new StreamReader(fs);
string rest = sr.ReadToEnd();
MatchCollection mc = Regex.Matches(rest, tool);
int number = mc.Count;
for(int i = 0; i < number; i++)
{
string tmp = "";
tmp = mc[i].ToString();
if (!keyValuePairWord.ContainsKey(tmp))
{
keyValuePairWord.Add(tmp, 1);
}
else
{
keyValuePairWord[tmp]++;
}
}
var res = from pair in keyValuePairWord
orderby pair.Value descending, pair.Key ascending
select pair;
Dictionary<string, int> result = new Dictionary<string, int>();
int j = 0;
foreach (var i in res)
{
if (j == n)
{
break;
}
result.Add(i.Key, i.Value);
j++;
Console.WriteLine(i.Key + ":" + i.Value);
}
sr.Close();
fs.Close();
return result;
}
- PhraseStat incoming file path and the parameters m, the read line by line, determining whether each row having m Matched word group, and the group represented by the string of words stored in the dictionary, the dictionary returns
/// <summary>
/// 统计词组
/// </summary>
/// <param name="path"></param>
/// <param name="m"></param>
public Dictionary<string, int> PhraseStat(string path, int m)
{
Dictionary<string, int> keyValuesPairPhrase = new Dictionary<string, int>();
string tool1 = @"\b[a-zA-z]\w{0,}";
FileStream fs = new FileStream(path, FileMode.Open);
StreamReader sr = new StreamReader(fs);
string Line = "";
while ((Line = sr.ReadLine()) != null)
{
MatchCollection mc = Regex.Matches(Line, tool1);
for (int i = 0; i < mc.Count - m + 1; i++)
{
string tmp = "";
for (int j = i; j < i + m; j++)
{
if (mc[j].Length < 4)
{
goto tick;
}
tmp += mc[j].ToString() + " ";
}
if (!keyValuesPairPhrase.ContainsKey(tmp))
{
keyValuesPairPhrase.Add(tmp, 1);
}
else
{
keyValuesPairPhrase[tmp]++;
}
tick:;
}
}
Dictionary<string, int> result = new Dictionary<string, int>();
foreach (var i in keyValuesPairPhrase)
{
Console.WriteLine(i.Key + ":" + i.Value);
result.Add(i.Key, i.Value);
}
sr.Close();
fs.Close();
return result;
}
7. Summary
The job gains are large, first of all I consolidated the C # file read, regular expressions, Linq, dictionary use, was only studied until the practice was repeated using the master. Secondly pair programming, two different ideas, I can be a good inspiration in the design. Of course, cooperation must be strict requirements on their own, to function functions explanatory notes, naming have to specifications. Finally, peer review of code to find a good design for everyone blind, after all, some design errors for granted that they will be ignored, multiple partners can be a good catch insects. I think this pair is 1 + 1> 2.