PTA Ladder Competition Exercises L1-064 AI Core Code with an Valuation of 100 Million Share of Test Point Error Analysis and Troubleshooting Test Point Error Skills with Detailed Solution

AI core code valued at 100 million

AI core code valued at 100 million
The above pictures are from Sina Weibo.

This question requires you to implement a slightly more valuable AI English question answering program. The rules are:

No matter what the user says, first print out what the other party said in one line; eliminate redundant
spaces in the original text: replace multiple spaces between adjacent words with one space, delete all spaces at the beginning and end of the line, and delete spaces before punctuation marks; change all uppercase English letters in the original text to lowercase, except; Replace the sign with an exclamation point ; output the replaced sentence on one line
as the IAI 's answer .
can youcould youI canI could
Imeyou
?!

Input format:
Input firstly give a positive integer N not exceeding 10 in the first line, and then N lines, each line gives a user's dialogue not exceeding 1000 characters and ending with carriage return. The dialogue is a non-empty string, including only letters, numbers, spaces, and visible half-width punctuation marks.

Output format:
output according to the requirements of the title, each AI answer must be preceded by AI: and a space.

Input sample:

6
Hello ?
 Good to chat   with you
can   you speak Chinese?
Really?
Could you show me 5
What Is this prime? I,don 't know

Sample output:

Hello ?
AI: hello!
 Good to chat   with you
AI: good to chat with you
can   you speak Chinese?
AI: I can speak chinese!
Really?
AI: really!
Could you show me 5
AI: I could show you 5
What Is this prime? I,don 't know
AI: what Is this prime! you,don't know

It's 1:28 in the morning, and I'm starting this blog, and I'm going to write about this question in as much detail as possible. Because of this question, I wasted a lot of effort, with many pitfalls and many details. Below I will also list some analysis of test point errors and pits. Without further ado, let's get to the point.
At first glance, this question is quite simple, just replace the string according to the requirements. But once submitted, there are always one or two test points that fail. Next we analyze one by one.

No matter what the user says, first print out what the other party said in one line;

The first requirement is not difficult, just input and output.

Eliminate redundant spaces in the original text: replace multiple spaces between adjacent words with one space, delete all spaces at the beginning and end of the line, and delete spaces before punctuation marks;

This requirement is to eliminate spaces in the original text. The requirements are also simple and clear. It should not be difficult to understand the deletion of spaces at the beginning and end of the line.
Notice! ! ! (In order to facilitate viewing this article, use #instead 空格)

####abc#d,#ef####
###abcabc#d
sbdd,#####
#,dav##tt
abc,#d#

Regardless of whether there are several spaces in front of it, one or more, all are cleared

abc#d,#ef

Test point 1 seems to test the elimination of the leading and trailing spaces.
Test point sample test, if test point 1 fails, you can test the following data:

1
#####abcd#####,####123478#####?#####   

Correct output:

#####abcd#####,####123478#####?#####   
AI:#abcd,#123478!

PS: AI: #The space (#) in this is not the one in front of the string, but the title requires the output format AI: add a space. That is AI:#, so there are no spaces at the beginning and end of the string when it is output correctly.
This is relatively simple and will not be described in detail. Continue to look back. Replace multiple spaces between adjacent words with 1 space and delete spaces before punctuation marks. These two requirements are analyzed together. Because punctuation and words can be combined. It is not difficult to understand that the extra spaces in the middle of the word are deleted.

apple###can####I

The red mark here does not mean that those spaces must be deleted. The red mark is the author's random mark. You can delete any space as long as you leave a space between words.

apple#can#I

! ! ! Note: The word referred to here is not the English word we understand. The word mentioned here is a combination of numbers and letters alone or mixed. If there is no space and punctuation in the middle, it belongs to the word.
For example:

aaa
a123
2345
56kda345

All of the above are words in this question.
Words plus punctuation start to get tricky. The space before the punctuation mark can be deleted, and the space after it cannot be deleted.
For example:

12#,##abc
dbc##,#aa#,

Any spaces marked above must be deleted.

12,#abc
dbc,#aa,

Below are a few special cases, which are also one of the many pitfalls of this question.

#,abc,#

Although the last space is the space after the punctuation mark, this space belongs to the trailing space, and the above requires that the first and last spaces be deleted. This must be deleted.

,abc,

Looking at a particular example.

aab, ####, ###abc

Marked spaces have to be deleted, although for the first comma, the space is behind it, but for the second comma, all spaces are in front of it, so all have to be deleted.

aab, #abc

Let's take another special case. Opening is straight punctuation. The test data of test point 4 is this kind.

,abc
,#abc

These two examples have no spaces to remove. The output is the same as the original.
Test point 4 is a sample test. If the test point fails, especially if the test point 4 runs overtime, you can test the following sample.
test 1

1
,dac

correct output

,dec
AI: ,dec

test 2

 #,dab

correct output

 #,dab
AI: ,dab

Words and punctuation marks are almost enough, let's continue to look down.

Change all uppercase English letters in the original text to lowercase, except I; replace
all independent can youand could youcorrespondingly in the original text with I can, I could-- here "independent" refers to words separated by spaces or punctuation marks; replace
all independent Iand in the original text with ;meyou

There are too many pitfalls here. First of all, we have to understand what an independent word is. According to the title description, words separated by spaces and punctuation marks are independent.
For example:

I
can you###Abc###could you
###abc##bc,##ab
I##can you,ibc
abds,==b==##,could you
could you,BC,bnm
me#Dc,nn

The red spaces above are spaces that need to be deleted, and the yellow ones belong to independent words or phrases. can youAnd could youalthough there is a space in the middle, the title requires him to be viewed as a whole, so he is regarded as a word or an indivisible phrase as a whole.

Change all uppercase English letters in the original text to lowercase, except I;

Now the title requirement is to convert all letters except I to lowercase. This is not difficult.But one thing to note is that I is uppercase not lowercase. There will be a big hole next.

Replace all independent can youand could youcorrespondingly in the original text I can, and replace I could
all independent Iand in the original text with ;meyou

There is a big hole in these two sentences. Let me give you an example to make it clear.
Let's look at a few examples first, this is test point 2, friends who make mistakes can try
Test 1

1
could#youI

correct output

could#youI
AI: could#youI

test 2

1
could#youI#Ican#you#Ime#MEI#Icound#meyou

correct output

could#youI#Ican#you#Ime#MEI#Icound#meyou
AI:#could#youI#Ican#you#Ime#meI#Icound#meyou

In fact, I feel that if the condition of independent words and phrases is well grasped, the above examples will not go wrong. It is to test and judge whether it is an independent word or an independent phrase. That needs to be rigorous. If there is no problem, the above few are generally fine.
The next step is where the big pit is, and it is also an example.
If test point 1 fails, you can try the following test data.

1 
can#me

correct output

can#me
AI: can#you

error output

can#me
AI: I#can

Let's analyze how the error output comes from.
First of all, the original text is can me and then me is replaced by you according to the requirements, can me becomes can you, and then can you is replaced by I can according to the requirements. This change is not allowed twice in this topic. Modifications can only be made on the basis of the original text, and words that have already been modified cannot be modified. Let's look at another example.

1
can#you

correct output

can#you
AI: I#can

error output

can#you
AI: you#can

Let's continue to analyze the error output. The original text can you was replaced by I can. Then I can was replaced by you can. It is not allowed to be replaced repeatedly. So we must avoid this pit.

Let's take a more specific example. If test point 4 fails, you can also try this test data

1
}7`@ir%>kaV&I2X

correct output

}7`@ir%>kav&I2x
AI: }7`@ir%>kav&I2x

error output

}7`@ir%>kaV&you2X
AI: }7`@ir%>kaV&you2X

In this example, there is an I, which is preceded by a symbol but followed by a number. As we said above, the combination of letters and numbers next to each other is treated as a word in this question, so it does not meet the conditions of an independent word. The I inside is not an independent word and cannot be replaced.
Then we look at the next requirement.

Replace all question marks in the original text ?with exclamation marks !;

This is easy. Find all the question marks and replace them.Remember that all question marks need to be replaced, and none of them can be missed

Output the replaced sentence in one line as the AI's answer.

The last one is a simple output format, just add the prefix AI:# (space) to the processed sentence and output it as required.
It is now 3:47 in the morning and I have been writing for two hours without knowing it. I am quite tired, and there is still some content. I will make up the rest tomorrow.
Continue to write the article at 14:20 pm the next day.
The analysis of the above topic requirements is completed. Next, let’s talk about how to judge the test points after encountering problems on this question. It is the way to find out those special test cases. First of all, of course, create special test cases according to the requirements of the topic. But sometimes there are special use cases, special circumstances we may not immediately think of. Then check the information, but not all questions can find information. So what to do? We can think about it, since we can't artificially generate a bunch of test data in special circumstances, the massive data test may be able to randomly generate that special data. However, it is obviously impossible for us to artificially generate massive test data. We manually implement thousands of tens of thousands of inputs, which is not efficient and useless. This kind of boring work computer is the best choice. Then we write a program that randomly generates test data.
First of all, since we are dealing with strings, we need a function to randomly generate strings

string rand_str(const int len)  /*参数为字符串的长度*/
{
    
    
	/*初始化*/
	string str;                 /*声明用来保存随机字符串的str*/
	char c;                     /*声明字符c,用来保存随机生成的字符*/
	int idx;                    /*用来循环的变量*/
	/*循环向字符串中添加随机生成的字符*/
	for (idx = 0; idx < len; idx++)
	{
    
    
		/*rand()%95是取余,余数为0~94加上32,就是我们要的字符,详见asc码表*/
		c = 33 + abs(rand() % 95);
		str.push_back(c);       /*push_back()是string类尾插函数。这里插入随机字符c*/
	}
	return str;                 /*返回生成的随机字符串*/
}

Adjust and adapt our code, add the function of automatically generating test data and automatically testing to the code. That is to add a shell to the code, which can automatically generate test data and run the code for testing.

Code that automatically generates test data and tests automatically

#if 1			//测试数据自动生成并验证
#include <iostream>
#include <string>
#include <cctype>
#include <cstdlib>
#include <cmath>

using namespace std;

string stringProcessingFactory(string str);		//字符串工程处理字符串
string stringReplacement(string subStr, string oldStr, string newStr);	//子串替换
bool testSource(int nN, string* strS);	//测试接口
string rand_str(const int len);		//随机生成字符串

int main()
{
    
    
	int testTime;		//测试次数
	cin >> testTime;	//输入测试次数
	unsigned seed;  // Random generator seed
	seed = time(0);		//随机数种子用时间参数做代替
	srand(seed);		
	// Use the time function to get a "seed” value for srand
	string* ptrStr = NULL;
	for (int i = 1; i <= testTime; i++)
	{
    
    
		//一次测试的测试数据量,这里取余10意思就是一次的测试中测试数据不可以超过十个
		int testNum = 1 + rand() % 10;
		//进行动态申请内存
		ptrStr = new string[testNum];
		//给测试数据赋值随机字符串,这里的15用于限制字符串长度
		for (int j = 0; j < testNum; j++)
			ptrStr[j] = rand_str(15);
		//打印出是第几次测试
		cout << "No." << i << "  TestNum is " << testNum << endl;
		//把测试数据送入测试接口
		testSource(testNum, ptrStr);
		//不要忘了释放空间
		delete[] ptrStr;
	}

	return 0;
}

string stringProcessingFactory(string str)
{
    
    
	//字符串加工厂
	string tempStr;		//存储加工完的字符串
	char lastChar = ' ';		//存储上一次被检测的字符,初始化为空格下面的语句就可以判断开头就有空格情况了。
	for (auto it = str.begin(); it != str.end(); it++)
	{
    
    
#if 0
		//比较好理解的写法
		//此判断处理的样例(#代表空格)###abc , sf###c等。 
		if (' ' == lastChar && ' ' == *it)	//上一个字符是空格并且当前字符也是空格明显空格冗余了忽略多余空格
			;				//有个空语句
		//上一个处理的字符是空格当前处理的字符不是空格也不是字母和数字的情况并且保证字符串不是空的
		//此判断处理的样例(#代表空格)#,dat## ,abc##,dd等
		else if (' ' == lastChar && ' ' != *it && !isalpha(*it) && !isdigit(*it) && !tempStr.empty())
		{
    
    
			//既然当前字符不是字母或者数字那么就是标点符号那么上一个空格将是多余的
			//弹出多余的字符,但是要保证字符串不是空的
			tempStr.pop_back();
			//发现合法英文字符,除了I以外其他的统统转小写,不必考虑当前字符是否是小写大写一律转换成小写
			if ((*it >= -1 && *it <= 255))
				if (isalpha(*it) && 'I' != *it)
					*it = tolower(*it);
			//处理好的字符放入容器保存好
			tempStr.push_back(*it);
		}
		else	//上面两情况过滤完后其他的情况只需要转换成小写并且放入容器存好就好了
		{
    
    
			if ((*it >= -1 && *it <= 255))
				if (isalpha(*it) && 'I' != *it)
					*it = tolower(*it);
			tempStr.push_back(*it);
		}
#endif
#if 1
		//重构后的代码
		if (' ' != lastChar || ' ' != *it)
		{
    
    
			if (' ' == lastChar && ' ' != *it && !isalpha(*it) && !isdigit(*it) && !tempStr.empty())
				tempStr.pop_back();
			if ((*it >= -1 && *it <= 255))
				if (isalpha(*it) && 'I' != *it)
					*it = tolower(*it);

			tempStr.push_back(*it);
		}
#endif
		//每次处理完一个字符都把他更新为上一次处理过的字符方便后面步骤使用
		lastChar = *it;
	}
	//处理完字符串如果处理完的字符串是空的那么直接返回空串
	if (tempStr.empty())
		return "";
	//处理完的字符串不是空串但是尾巴有一个空格的话弹出这个多余的空格
	if (' ' == tempStr.back())
		tempStr.pop_back();
	//从字符串一直寻找?号直到替换完所有的问号为止
	while (tempStr.find("?") != string::npos)
		tempStr.replace(tempStr.find("?"), 1, "!");
	//替换符合条件的字串
	//把me换成YOU而不是you,因为如果换成youd的话再can me这种情况会变成can you
	//can you 又再次被下面替换成 I can导致出错,下面也是以此类推换成YOU
	//you全部换成大写YOU我们不需要担心会不会和其他字符串冲突
	//因为经过上面的处理字符串已经没有大写了。
	//所有YOU就是唯一的大写,不存在其他特殊情况
	tempStr = stringReplacement("me", tempStr, "YOU");
	tempStr = stringReplacement("I", tempStr, "YOU");
	tempStr = stringReplacement("can you", tempStr, "I can");
	tempStr = stringReplacement("could you", tempStr, "I could");
	//替换完字串把我们的YOU改回小写you
	for (auto it = tempStr.begin(); it != tempStr.end(); it++)
		if ((*it >= -1 && *it <= 255))
			if (isalpha(*it) && 'I' != *it)
				*it = tolower(*it);

	//加工好的字符串返回出去
	return tempStr;
}

string stringReplacement(string subStr, string oldStr, string newStr)
{
    
    
	//子串替换
	int pos = 0;		//记录位置
	bool isFlag = true;  //标记是否符合子串条件
	//外循环遍历字符串用子串的第一个字符对比外循环的每一个字符
	for (int i = 0; i < oldStr.size(); i++)
		if (oldStr[i] == subStr.front())	//发现相同字符那么判断是不是子串
		{
    
    
			//初始化标记为默认是子串
			isFlag = true;
			pos = i;	//记录相同位置
			//同时遍历字符串和子串看看是否符合子串条件
			for (int j = 0; j < subStr.size(); j++)
			{
    
    
				//字符串和子串进行对比时中途发现不一致情况
				if (oldStr[pos] != subStr[j])
				{
    
    
					//不符合子串情况,立即退出停止对比进入外循环继续寻找
					isFlag = false;
					break;
				}
				//遍历字符串时使用的临时坐标记录
				pos++;
			}

#if 0
			//isFlag必须是true也就是符合子串条件的才能进行替换
			if (isFlag && i > 0)
			{
    
    
				//i > 0 时看符合条件的首字符前位和符合条件的末位是否为字词数字不是的话才是独立的字词数字这种情况下才进行替换
				if (isFlag && !isalpha(oldStr[i - 1]) && !isalpha(oldStr[i + subStr.size()]) && !isdigit(oldStr[i - 1]) && !isdigit(oldStr[i + subStr.size()]))
				{
    
    
					oldStr.replace(i, subStr.size(), newStr);
					i = i + newStr.size();
				}
			}
			//i是0也就是子串头就是字符串的头的时候不需要判断前一位,直接判断末位即可,因为不存在前一位
			if (isFlag && i == 0 && !isalpha(oldStr[i + subStr.size()]) && !isdigit(oldStr[i + subStr.size()]))
			{
    
    
				oldStr.replace(i, subStr.size(), newStr);
				i = i + newStr.size();
			}
#endif
#if 1
			//代码重构
			if (isFlag && !isalpha(oldStr[i + subStr.size()]) && !isdigit(oldStr[i + subStr.size()]))
			{
    
    
				bool needFlag = false;
				if (i >= 0)
				{
    
    
					needFlag = true;
					if (i != 0)
						if (isalpha(oldStr[i - 1]) || isdigit(oldStr[i - 1]))
							needFlag = false;
				}
				if (needFlag)
				{
    
    
					oldStr.replace(i, subStr.size(), newStr);
					i = i + newStr.size();
				}
			}
#endif
		}
	//返回替换好的字符串
	return oldStr;
}

bool testSource(int nN, string* strS)
{
    
    
	//用于调用功能函数进行测试任务
	int n;	//测试数据的数量
	n = nN;		//保存测试数据的数量
	string* ptrStr = strS; //内存中找到测试数据
	//调用功能函数给函数派发测试数据
	for (int i = 0; i < n; i++)
	{
    
    
		//输出原始的测试数据
		cout << "OLD: " << ptrStr[i] << endl;
		//输出加工后的测试数据
		cout << "NAI: " << stringProcessingFactory(ptrStr[i]) << endl;
	}
	return true;
}

string rand_str(const int len)  /*参数为字符串的长度*/
{
    
    
	//生成随机字符串
	/*初始化*/
	string str;                 /*声明用来保存随机字符串的str*/
	char c;                     /*声明字符c,用来保存随机生成的字符*/
	int idx;                    /*用来循环的变量*/
	/*循环向字符串中添加随机生成的字符*/
	for (idx = 0; idx < len; idx++)
	{
    
    
		/*rand()%95是取余,余数为0~94加上32,就是我们要的字符,详见asc码表*/
		c = 33 + abs(rand() % 95);
		str.push_back(c);       /*push_back()是string类尾插函数。这里插入随机字符c*/
	}
	return str;                 /*返回生成的随机字符串*/
}

#endif

At this point, we have got a code that can automatically generate test data and perform automatic testing. We only need to run the code and write a number such as 100, and the code will simulate a hundred tests and test the code a hundred times with random data. But it is not difficult to see that it seems useless. The amount of automatically generated test data is very large. This is indeed the effect we want. But the test can be tested and can also be carried out with huge test data, a lot of tests. But is the output result of the program correct? If not, what test data is used incorrectly. This is extremely important information to us. But the current testing program cannot give us this information. So what to do?
To solve this problem, we can first find a way to save the program output to a file. Then compare it with the content of a file with standard output to see if there is a problem. This is like when we make an exam paper and write the answers on the answer sheet, and take out the standard answers to check after writing the answers. When it comes to the standard answer, we will think of another question, where can we find the standard answer? Looks like we don't have a standard answer. It is too difficult to get the test data of the topic. Since we can't get the standard answer, we can make the standard answer ourselves. Find one on the Internet or write a program by yourself using native methods, or even enumeration methods. What does this program do? This program can call the AC code to test random data and save the test records. What does that mean? That is to say, you can find a code capable of AC and use a program to automatically generate test data, and automatically call the AC code to perform input and output tests and save the test results, that is, the input test data and the output operation results. In this way, we get a standard answer made by ourselves. As for those special examples or samples, it is also possible to generate similar special examples if the randomly generated test data is a large amount of data.

Automatically generate test data and test can also record test data and results

#include<iostream>
#include<cstring>
#include <string>
#include <cctype>
#include <cstdlib>
#include <cmath>
#include <fstream>
#include <ctime>

using namespace std;

#define Change s2[len2++]='y',s2[len2++]='o',s2[len2++]='u';continue;  // I 与 me 转换记入

bool testSource(int nN, string* strS);		//测试接口
string rand_str(const int len);			//随机生成字符串

int main()
{
    
    
	int testTime;		//测试次数
	cin >> testTime;		//输入测试次数
	unsigned seed;  // Random generator seed
	seed = time(0);		//用系统时间当作随机数种子
	srand(seed);
	// Use the time function to get a "seed” value for srand
	string* ptrStr;
	fstream foi("./in_good.txt", ios::out);	//关联文件

	for (int i = 1; i <= testTime; i++)
	{
    
    
		//一次测试的测试数据量,这里取余10意思就是一次的测试中测试数据不可以超过十个
		int testNum = 1 + rand() % 10;
		//进行动态申请内存
		ptrStr = new string[testNum];
		//给测试数据赋值随机字符串,这里的15用于限制字符串长度
		for (int j = 0; j < testNum; j++)
			ptrStr[j] = rand_str(rand() % 15);
		//在输入文件中写入输入的测试数据
		foi << testNum << endl;
		//把测试数据继续写入文件
		for (int k = 0; k < testNum; k++)
			foi << ptrStr[k] << endl;
		//调用测试函数
		testSource(testNum, ptrStr);
		//释放空间是不能忘记的
		delete[] ptrStr;
	}
	foi.close();
	return 0;
}

bool testSource(int nN, string* strS)
{
    
    
	int n;		//单次测试的测试数据数量
	n = nN;		//保存单次测试数据的测试数据数量
	string* ptrStr = strS;
	/***    下面接入开源的AC代码用于制作标准答案    ***/
	char s1[10001], s2[10001]; // 注意字符串要够大,不然会有测试点过不了
	int num, flag;
	/*cin >> num;
	getchar();*/
	num = n;
	int i = 0;
	fstream foi("D:/out_good.txt", ios::app); //关联文件
	//  开始数据读入 num 次
	while (num--)
	{
    
    
		int len1, len2 = 0;
		//cin.getline(s1, 1001);
		std::strcpy(s1, ptrStr[i].c_str());
		i++;
		foi << s1 << endl;
		std::cout << s1 << endl;

		/*  去掉头尾空格  */
		for (flag = 0; flag < strlen(s1); flag++) if (s1[flag] != ' ') break;
		std::strcpy(s1, s1 + flag);
		for (flag = strlen(s1) - 1; flag >= 0; flag--) if (s1[flag] != ' ') break;
		s1[flag + 1] = '\0';

		//  初步处理: 去掉多余空格 ,大小转小写 , ? -> !
		len1 = strlen(s1);
		for (int z = 0; z < len1; z++)
		{
    
    
			if (s1[z] >= 'A' && s1[z] <= 'Z' && s1[z] != 'I') s1[z] += 32;
			if (s1[z] == '?') s1[z] = '!';
			if (s1[z] == ' ' && isalnum(s1[z + 1]) == 0) continue;
			s2[len2++] = s1[z];
		}
		s2[len2] = '\0';
		std::strcpy(s1, s2);  // s1 = s2
		len1 = len2; len2 = 0;

		/*  开始核心判断  */
		for (int z = 0; z < len1; z++)
		{
    
    
			// 判断 I
			if (s1[z] == 'I' && isalnum(s1[z - 1]) == 0 && isalnum(s1[z + 1]) == 0) {
    
    
				Change;
			}
			// 判断 me
			if (strstr(s1 + z, "me") == s1 + z && isalnum(s1[z - 1]) == 0 && isalnum(s1[z + 2]) == 0) {
    
    
				z++; Change
			}
			// 判断 can you
			if (strstr(s1 + z, "can you") == s1 + z && isalnum(s1[z - 1]) == 0 && isalnum(s1[z + 7]) == 0) {
    
    
				s2[len2++] = 'I', s2[len2++] = ' ', s2[len2++] = 'c', s2[len2++] = 'a', s2[len2++] = 'n';
				z += 6; continue;
			}
			// 判断 could you
			if (strstr(s1 + z, "could you") == s1 + z && isalnum(s1[z - 1]) == 0 && isalnum(s1[z + 9]) == 0) {
    
    
				s2[len2++] = 'I', s2[len2++] = ' ', s2[len2++] = 'c', s2[len2++] = 'o', s2[len2++] = 'u', s2[len2++] = 'l', s2[len2++] = 'd';
				z += 8; continue;
			}
			// 上述判断都不符合,直接存入
			s2[len2++] = s1[z];
		}
		//  AI : 回答输出
		s2[len2] = '\0';
		//把AC代码的输出写入输入数据文件
		foi << "AI: " << s2 << endl;
		std::cout << "AI: " << s2 << endl;
	}
	foi.close();
	return true;
}

#if 0 //初代随机字符串函数
string rand_str(const int len)  /*参数为字符串的长度*/
{
    
    
	/*初始化*/
	string str;                 /*声明用来保存随机字符串的str*/
	char c;                     /*声明字符c,用来保存随机生成的字符*/
	int idx;                    /*用来循环的变量*/
	/*循环向字符串中添加随机生成的字符*/
	for (idx = 0; idx < len; idx++)
	{
    
    
		/*rand()%95是取余,余数为0~94加上32,就是我们要的字符,详见asc码表*/
		c = 33 + abs(rand() % 95);
		str.push_back(c);       /*push_back()是string类尾插函数。这里插入随机字符c*/
	}
	return str;                 /*返回生成的随机字符串*/
}
#endif

//新版随机字符串生成函数
string rand_str(const int len)  /*参数为字符串的长度*/
{
    
    
	/*初始化*/
	string str = "";                 /*声明用来保存随机字符串的str*/
	string word_str[99] = {
    
     "I", "me", "can you", "could you" };
	for (int i = 4; i < 99; i++)
		word_str[i] = 28 + i;
	//	char c;                     /*声明字符c,用来保存随机生成的字符*/
	int idx;                    /*用来循环的变量*/
	/*循环向字符串中添加随机生成的字符*/
	for (idx = 0; idx < len; idx++)
	{
    
    
		/*rand()%95是取余,余数为0~94加上32,就是我们要的字符,详见asc码表*/
		//c = word_str[rand() % 94];
		//str.push_back(c);       /*push_back()是string类尾插函数。这里插入随机字符c*/
		str += word_str[rand() % 99];
	}
	return str;                 /*返回生成的随机字符串*/
}

Careful friends have discovered certain details. Yes, the function of randomly generating strings has been changed to this.

//新版随机字符串生成函数
string rand_str(const int len)  /*参数为字符串的长度*/
{
    
    
	/*初始化*/
	string str = "";                 /*声明用来保存随机字符串的str*/
	string word_str[99] = {
    
     "I", "me", "can you", "could you" };
	for (int i = 4; i < 99; i++)
		word_str[i] = 28 + i;
	//	char c;                     /*声明字符c,用来保存随机生成的字符*/
	int idx;                    /*用来循环的变量*/
	/*循环向字符串中添加随机生成的字符*/
	for (idx = 0; idx < len; idx++)
	{
    
    
		/*rand()%95是取余,余数为0~94加上32,就是我们要的字符,详见asc码表*/
		//c = word_str[rand() % 94];
		//str.push_back(c);       /*push_back()是string类尾插函数。这里插入随机字符c*/
		str += word_str[rand() % 99];
	}
	return str;                 /*返回生成的随机字符串*/
}

Randomly drawn strings are drawn from an array of strings. In this way, we can customize the content of the random string more freely. For example, it is quite difficult to generate strings such as "me" and "I can" in this question when they are randomly generated. With the above improvements, it is easier to generate the string format we need. Ok, let's continue our research.
Now we have a file that records the randomly generated massive test data and the standard output results. Now we simply modify our code so that our code can call the content in the input file we generated, and write the output result to the file for storage. This makes it easier for us to find our mistakes.
The code to implement the above function is as follows:

#if 1

#include <iostream>
#include <string>
#include <cctype>
#include <cstdlib>
#include <cmath>
#include <fstream>

using namespace std;

string stringProcessingFactory(string str);		//字符串工程处理字符串
string stringReplacement(string subStr, string oldStr, string newStr);	//子串替换
bool testSource(int nN, string* strS);	//测试接口
string rand_str(const int len);		//随机生成字符串

int main()
{
    
    
	int testTime;		//测试次数
	cin >> testTime;	//输入测试次数
	unsigned seed;  // Random generator seed
	seed = time(0);		//随机数种子用时间参数做代替
	srand(seed);
	// Use the time function to get a "seed” value for srand
	string* ptrStr = NULL;
	string tempTestNum = "";
	fstream foi("D:/WorkSpace/DevC++/in_good.txt", ios::in); //关联文件
	for (int i = 1; i <= testTime; i++)
	{
    
    
		//从文件中输入内容
		getline(foi, tempTestNum);
		//因为是字符串形式输入的得转换一下
		int testNum = stoi(tempTestNum);
		//进行动态申请内存
		ptrStr = new string[testNum];
		//给测试数据赋值随机字符串,这里的15用于限制字符串长度
		for (int j = 0; j < testNum; j++)
			getline(foi, ptrStr[j]);
		//把测试数据送入测试接口
		testSource(testNum, ptrStr);
		//不要忘了释放空间
		delete[] ptrStr;
	}

	return 0;
}

string stringProcessingFactory(string str)
{
    
    
	//字符串加工厂
	string tempStr;		//存储加工完的字符串
	char lastChar = ' ';		//存储上一次被检测的字符,初始化为空格下面的语句就可以判断开头就有空格情况了。
	for (auto it = str.begin(); it != str.end(); it++)
	{
    
    
#if 0
		//比较好理解的写法
		//此判断处理的样例(#代表空格)###abc , sf###c等。 
		if (' ' == lastChar && ' ' == *it)	//上一个字符是空格并且当前字符也是空格明显空格冗余了忽略多余空格
			;				//有个空语句
		//上一个处理的字符是空格当前处理的字符不是空格也不是字母和数字的情况并且保证字符串不是空的
		//此判断处理的样例(#代表空格)#,dat## ,abc##,dd等
		else if (' ' == lastChar && ' ' != *it && !isalpha(*it) && !isdigit(*it) && !tempStr.empty())
		{
    
    
			//既然当前字符不是字母或者数字那么就是标点符号那么上一个空格将是多余的
			//弹出多余的字符,但是要保证字符串不是空的
			tempStr.pop_back();
			//发现合法英文字符,除了I以外其他的统统转小写,不必考虑当前字符是否是小写大写一律转换成小写
			if ((*it >= -1 && *it <= 255))
				if (isalpha(*it) && 'I' != *it)
					*it = tolower(*it);
			//处理好的字符放入容器保存好
			tempStr.push_back(*it);
		}
		else	//上面两情况过滤完后其他的情况只需要转换成小写并且放入容器存好就好了
		{
    
    
			if ((*it >= -1 && *it <= 255))
				if (isalpha(*it) && 'I' != *it)
					*it = tolower(*it);
			tempStr.push_back(*it);
		}
#endif
#if 1
		//重构后的代码
		if (' ' != lastChar || ' ' != *it)
		{
    
    
			if (' ' == lastChar && ' ' != *it && !isalpha(*it) && !isdigit(*it) && !tempStr.empty())
				tempStr.pop_back();
			if ((*it >= -1 && *it <= 255))
				if (isalpha(*it) && 'I' != *it)
					*it = tolower(*it);

			tempStr.push_back(*it);
		}
#endif
		//每次处理完一个字符都把他更新为上一次处理过的字符方便后面步骤使用
		lastChar = *it;
	}
	//处理完字符串如果处理完的字符串是空的那么直接返回空串
	if (tempStr.empty())
		return "";
	//处理完的字符串不是空串但是尾巴有一个空格的话弹出这个多余的空格
	if (' ' == tempStr.back())
		tempStr.pop_back();
	//从字符串一直寻找?号直到替换完所有的问号为止
	while (tempStr.find("?") != string::npos)
		tempStr.replace(tempStr.find("?"), 1, "!");
	//替换符合条件的字串
	//把me换成YOU而不是you,因为如果换成youd的话再can me这种情况会变成can you
	//can you 又再次被下面替换成 I can导致出错,下面也是以此类推换成YOU
	//you全部换成大写YOU我们不需要担心会不会和其他字符串冲突
	//因为经过上面的处理字符串已经没有大写了。
	//所有YOU就是唯一的大写,不存在其他特殊情况
	tempStr = stringReplacement("me", tempStr, "YOU");
	tempStr = stringReplacement("I", tempStr, "YOU");
	tempStr = stringReplacement("can you", tempStr, "I can");
	tempStr = stringReplacement("could you", tempStr, "I could");
	//替换完字串把我们的YOU改回小写you
	for (auto it = tempStr.begin(); it != tempStr.end(); it++)
		if ((*it >= -1 && *it <= 255))
			if (isalpha(*it) && 'I' != *it)
				*it = tolower(*it);

	//加工好的字符串返回出去
	return tempStr;
}

string stringReplacement(string subStr, string oldStr, string newStr)
{
    
    
	//子串替换
	int pos = 0;		//记录位置
	bool isFlag = true;  //标记是否符合子串条件
	//外循环遍历字符串用子串的第一个字符对比外循环的每一个字符
	for (int i = 0; i < oldStr.size(); i++)
		if (oldStr[i] == subStr.front())	//发现相同字符那么判断是不是子串
		{
    
    
			//初始化标记为默认是子串
			isFlag = true;
			pos = i;	//记录相同位置
			//同时遍历字符串和子串看看是否符合子串条件
			for (int j = 0; j < subStr.size(); j++)
			{
    
    
				//字符串和子串进行对比时中途发现不一致情况
				if (oldStr[pos] != subStr[j])
				{
    
    
					//不符合子串情况,立即退出停止对比进入外循环继续寻找
					isFlag = false;
					break;
				}
				//遍历字符串时使用的临时坐标记录
				pos++;
			}

#if 0
			//isFlag必须是true也就是符合子串条件的才能进行替换
			if (isFlag && i > 0)
			{
    
    
				//i > 0 时看符合条件的首字符前位和符合条件的末位是否为字词数字不是的话才是独立的字词数字这种情况下才进行替换
				if (isFlag && !isalpha(oldStr[i - 1]) && !isalpha(oldStr[i + subStr.size()]) && !isdigit(oldStr[i - 1]) && !isdigit(oldStr[i + subStr.size()]))
				{
    
    
					oldStr.replace(i, subStr.size(), newStr);
					i = i + newStr.size();
				}
			}
			//i是0也就是子串头就是字符串的头的时候不需要判断前一位,直接判断末位即可,因为不存在前一位
			if (isFlag && i == 0 && !isalpha(oldStr[i + subStr.size()]) && !isdigit(oldStr[i + subStr.size()]))
			{
    
    
				oldStr.replace(i, subStr.size(), newStr);
				i = i + newStr.size();
			}
#endif
#if 1
			//代码重构
			if (isFlag && !isalpha(oldStr[i + subStr.size()]) && !isdigit(oldStr[i + subStr.size()]))
			{
    
    
				bool needFlag = false;
				if (i >= 0)
				{
    
    
					needFlag = true;
					if (i != 0)
						if (isalpha(oldStr[i - 1]) || isdigit(oldStr[i - 1]))
							needFlag = false;
				}
				if (needFlag)
				{
    
    
					oldStr.replace(i, subStr.size(), newStr);
					i = i + newStr.size();
				}
			}
#endif
#if 0		//有漏洞!!!
			//代码重构
			if (isFlag && !isalpha(oldStr[i + subStr.size()]) && !isdigit(oldStr[i + subStr.size()]))
			{
    
    
				bool needFlag = false;
				if (i >= 0)
				{
    
    
					needFlag = true;
					if (i != 0)
						//此处if条件是isalpha(oldStr[i - 1]) || isdigit(oldStr[i - 1]) 才对
						if (isalpha(oldStr[i - 1]) && !isdigit(oldStr[i - 1]))
							needFlag = false;
				}
				if (needFlag)
				{
    
    
					oldStr.replace(i, subStr.size(), newStr);
					i = i + newStr.size();
				}
			}
#endif
		}
	//返回替换好的字符串
	return oldStr;
}

bool testSource(int nN, string* strS)
{
    
    
	//用于调用功能函数进行测试任务
	int n;	//测试数据的数量
	n = nN;		//保存测试数据的数量
	string* ptrStr = strS; //内存中找到测试数据
	fstream foi("D:/out_test.txt", ios::app); //关联文件
	//调用功能函数给函数派发测试数据
	for (int i = 0; i < n; i++)
	{
    
    
		//把结果输出打印并存入文件

		foi << ptrStr[i] << endl;
		//输出原始的测试数据
		cout << ptrStr[i] << endl;
		foi << "AI: " << stringProcessingFactory(ptrStr[i]) << endl;
		//输出加工后的测试数据
		cout << "AI: " << stringProcessingFactory(ptrStr[i]) << endl;
	}
	foi.close(); //关闭文件
	return true;
}

#if 0 //初代随机字符串函数
string rand_str(const int len)  /*参数为字符串的长度*/
{
    
    
	/*初始化*/
	string str;                 /*声明用来保存随机字符串的str*/
	char c;                     /*声明字符c,用来保存随机生成的字符*/
	int idx;                    /*用来循环的变量*/
	/*循环向字符串中添加随机生成的字符*/
	for (idx = 0; idx < len; idx++)
	{
    
    
		/*rand()%95是取余,余数为0~94加上32,就是我们要的字符,详见asc码表*/
		c = 33 + abs(rand() % 95);
		str.push_back(c);       /*push_back()是string类尾插函数。这里插入随机字符c*/
	}
	return str;                 /*返回生成的随机字符串*/
}
#endif

string rand_str(const int len)  /*参数为字符串的长度*/
{
    
    
	/*初始化*/
	string str = "";                 /*声明用来保存随机字符串的str*/
	string word_str[99] = {
    
     "I", "me", "can you", "could you" };
	for (int i = 4; i < 99; i++)
		word_str[i] = 28 + i;
	//char c;                     /*声明字符c,用来保存随机生成的字符*/
	int idx;                    /*用来循环的变量*/
	/*循环向字符串中添加随机生成的字符*/
	for (idx = 0; idx < len; idx++)
	{
    
    
		/*rand()%95是取余,余数为0~94加上32,就是我们要的字符,详见asc码表*/
		//c = word_str[rand() % 94];
		//str.push_back(c);       /*push_back()是string类尾插函数。这里插入随机字符c*/
		str += word_str[rand() % 99];
	}
	return str;                 /*返回生成的随机字符串*/
}

#endif

Ok, we can already get the result file output by our program according to the input.
insert image description here
But we are faced with a new problem, that is, the amount of data is too large. Let's open the file and have a look. insert image description here
This densely packed string I checked manually to find the difference. This will be a huge challenge. Generally speaking, this kind of boring and delicate work is usually done by computers, right? So how can the computer help us complete this unreasonable task?
There is a command called FC in cmd, which can help you check two txt files and list the different places for you.
As shown in the picture:
FC command to find differences
Then we can now find these special test cases to test according to the output different from the standard answer to improve our program, so that we can slowly break through our invincible test points. Good luck to everyone!
PS: Friends who read my code carefully may find that the test data of this question is not so rigorous and has loopholes, and some error codes can also pass. When some codes in the above code were refactored, I accidentally wrote the logic wrong but it still passed. The related vulnerability code is as follows.

//有漏洞!!!
			//代码重构
			if (isFlag && !isalpha(oldStr[i + subStr.size()]) && !isdigit(oldStr[i + subStr.size()]))
			{
    
    
				bool needFlag = false;
				if (i >= 0)
				{
    
    
					needFlag = true;
					if (i != 0)
						//此处if条件是isalpha(oldStr[i - 1]) || isdigit(oldStr[i - 1]) 才对
						if (isalpha(oldStr[i - 1]) && !isdigit(oldStr[i - 1]))
							needFlag = false;
				}
				if (needFlag)
				{
    
    
					oldStr.replace(i, subStr.size(), newStr);
					i = i + newStr.size();
				}
			}

Related vulnerability test data, the test data below the code with this vulnerability cannot pass.
test 1

1
M5#gdvcan youCT9kiwm{
    
    3/aJcWO0I

Error output:

M5#gdvcan youCT9kiwm{
    
    3/aJcWO0I
AI: m5#gdvcan youct9kiwm{
    
    3/ajcwo0you

correct output

M5#gdvcan youCT9kiwm{
    
    3/aJcWO0I
AI: m5#gdvcan youct9kiwm{
    
    3/ajcwo0I

test 2

1
rcould youZ#could youX/y4me&i(|&"Lc|Z

error output

rcould youZ#could youX/y4me&i(|&"Lc|Z
AI: rcould youz#could youx/y4you&i(|&"lc|z

correct output

rcould youZ#could youX/y4me&i(|&"Lc|Z
AI: rcould youz#could youx/y4me&i(|&"lc|z

test 3

1
s"&zd3Q`$[Fcan youxme$.Xk.Q;'xSI8can you

error output

s"&zd3Q`$[Fcan youxme$.Xk.Q;'xSI8can you
AI: s"&zd3q`$[fcan youxme$.xk.q;'xsI8I can

correct output

s"&zd3Q`$[Fcan youxme$.Xk.Q;'xSI8can you
AI: s"&zd3q`$[fcan youxme$.xk.q;'xsI8can you

Well, this article is almost over. Thanks for reading, creation is not easy, if this article is helpful to you, please give it a thumbs up, thank you. Your support is my motivation.

Guess you like

Origin blog.csdn.net/m0_52072919/article/details/124893295