Job demo

Computer interface design and implementation:

The overall design ideas in my companion blog: xxxxxxx

The total flow

I am responsible for that function modules are GetDic class and GetRes Print category and class of these three categories.

print resolution is mainly based command line parameter is determined is a text output or console output, text output StreamWriter using the character stream.

More critical is GetDic class and GetRes class, GetDic category is the use of a dictionary word frequency collection has generated set of words, then I pass the word CountWords the resulting set of incoming GetDic class through the collection, for the first time into the set, plus a duplicate word frequency, word frequency is generated with a set of words.

flow chart


GetRes class is to use a set of the above getdic generated by data processing, also use the dictionary as a result of the collection, I prioritized set of words on word frequency statistics, and then look for the word frequency in a set of words already lined dictionary order, if consistent with the Add to my result set, the output from the Print function


flow chart


The key code shows:


GetDic categories:


//传入有countWords使用的单词集合
//得到单词以及对应的数目存入泛型数组keyValues
public static Dictionary<string, int> createDic(StreamReader sr,List<string>words)
{
Dictionary<string, int> keyValues = new Dictionary<string, int>();
//如果这个泛型集合里面如果有这个单词的话就数量增加如果没有这个单词的话就把它加入这个集合
foreach (string s in words)
{
if (keyValues.ContainsKey(s))
{
keyValues[s]++;
}
else
{
keyValues.Add(s, 1);
}
}
return keyValues;
}


GetRes类:


//将单词进行词频排序并且输出前n个词频的单词
public static void SortKey(Dictionary<string, int> keyValues, Dictionary<string, int> result, int count)
{
//对该集合进行字典序排序
keyValues = keyValues.OrderBy(o => o.Key, StringComparer.Ordinal).ToDictionary(p => p.Key, o => o.Value);
//单词频数集合
List value = new List ();
foreach (int i in keyValues.Values)
{
value.Add(i);
}
//进行单词频数排序
value.Sort((x, y) => -x.CompareTo(y));
//如果参数是-1 则默认输出前10个
if (count == -1)
{
count = 10;
}
//次数变量
int index = 0;
foreach (var s in keyValues)
{
//找出单词频数为最高的单词
if (s.Value.Equals(value[0]) && index <= count)
{
//提取对应的单词以及出现的频数
result.Add(s.Key, value[0]);
index++;
}
}
//顺序提取单词频数前10的单词 同频的按照字典序排列
for (int i = 1; i < value.Count && index <= count; i++)
{
if (value[i] == value[i - 1])
continue;
foreach (var s in keyValues)
{
if (s.Value.Equals(value[i]))
{
if (index < count)
{
//按照制定格式输出对应的写入流中
result.Add(s.Key, value[i]);
index++;
}
else
break;
}
}
}
}

Four principles embody:


Design By Contract:


When there are calls to the methods required to file certain path, I set the text to .txt format as a data source, if that is not text, then this will prompt the information was wrong


Information Hiding:


That function of each module is independent, even though I set for static methods, but the use of regular expressions in CountWords time, specific regular expression for the other modules is concerned, it should be closed when, so I set about regular when the expression of private data.


Interface Design:


Defined according to its function as the name of the interface or interface functions, without any interface between any direct contact, such as a direct polymerization method calls, etc., but in the parameter portion, the results need to use another interface call.


Loose Coupling :


Can run independent of each module, we use the time in the first edition includes a utility class of these methods, the same using the same static data to store the results, and companion in the discussion, too found the link between modules tight, so the specific method of classification module independently. Enhance his loosely coupled.


Code review process:


The code through a process of self-examination and peer review, improve some deficiencies. Since the audit process found a stream of code comments and the question of release problems and feel of the interface design flaws, re-use the StreamReader class, then I will write streamreader main program, as a parameter, saving space.


That the peer review process, I found a companion code issues


You should use \ W non-alphanumeric characters represented, that \ w represents the English characters if you use the above, it is divided letters, not words, will cause the program to run, but the results are abnormal.


Performance improvements interface section:


 


Unit tests show:


Test function -CountChar (statistical character class)


Ideas: a simple reading of the document, use assertions to determine the length of the string is equal to their expectations




Test function -CountLine (count the number of rows category)


Ideas: a simple reading of the document, use assertions to determine the number of rows is equal to its own expectations




Test function CreatWords (statistical word class)


Idea: the same is read from a document text strings obtained using the test function generation word phrases, determining the number of words.



Test function -GetDic (word generation)

Ideas: I wanted to use to read the file, but slower for validation, I simulates a word list, thus by word list test is the test function, can test a frequency corresponding to the number of words in line with their expectations


Test function -GetRes (n word frequency statistics before and added to the result set)


Ideas: reading a document to generate a set of word dictionary, by means of a set of test functions to be tested, will be output to the front console 10 to a word, the following results






Test function -WordGroup (output phrases)


Thinking, reading text files, which give the number of words List, the test is a test function output sets of three words, the test result is output to the console




Through all the tests



Calculating portion exception handling module described


Exception handling mechanism, ignore the exception because of code issues arising, my main consideration here is related to abnormal reading aspects of the document, I designed several related operations, which do not enter a file format and file through if else judge before the main method, source of the data is the most important, and that with the right input files and formats, but the path will lead to the failure not read the appropriate file, so I used the try catch statement abnormalities capture, as follows



This path is determined and format of the test case as follows:



The scene is to prevent the user enters the wrong file format or input document handling mechanism forget the path, otherwise it will throw an exception program.

This is when streamReader reading the document could not be retrieved because of local resources lead to throwing an exception, the exception will be caught catch, output message and the interrupt routine.


Scene is the normal format of user input and document path, but do not have this resource in the local paper caused an exception.


Guess you like

Origin www.cnblogs.com/YMIng123/p/11665426.html