java统计文本中的不同的单词和重复的单词

System.out.println(str[p]);import java.io.*;
import java.util.*;
import java.nio.file.*;
public class Duqudata {
	public static void main(String[] args){
		String token="";   
		Path path=Paths.get("D:\\out.txt");
		try(InputStream input=Files.newInputStream(path, StandardOpenOption.READ);
				Scanner sc=new Scanner(input)){
					while(sc.hasNext()){
						 token +=sc.next()+" ";				
					}
				}catch(IOException e){
					e.printStackTrace();
				}		
		String[] str=token.split("[ ,.]");	//正则表达式中有空格
		Set<String> hs=new HashSet<>();
		Set<String> se=new HashSet<>();	
		for(String s:str){
			//if(!s.equals("")){
			      if(!hs.add(s)){          //如果集合中已经存在s,则add()返回false
				    se.add(s);
			      // }	
			}
		}
		System.out.println("不同的单词为:"+hs);
		System.out.print("重复的单词为:"+se);
	}
}

正则表达式处理字符串时从第一个非空白字符开始，也就是会保留前面的空白字符

再看变形：

import java.io.*;
import java.util.*;
import java.nio.file.*;
public class Duqudata2 {
	public static void main(String[] args){
		String token="";
		Path path=Paths.get("D:\\out.txt");
		try(InputStream input=Files.newInputStream(path, StandardOpenOption.READ);
				Scanner sc=new Scanner(input)){
					while(sc.hasNext()){
						 token +=" "+sc.next();		//字符串前加空格		
					}
				}catch(IOException e){
					e.printStackTrace();
				}		
		String[] str=token.split("[ ,.]");	//注意正则表达式里面有空格
		Set<String> hs=new HashSet<>();
		Set<String> se=new HashSet<>();	
		for(String s:str){
			if(!s.equals("")){              //注意("")中无空格，该语句把空格过滤掉，如果(" ")中有空格，则不能过滤空格
			      if(!hs.add(s)){           
				    se.add(s);
			       }	
			}
		}
		System.out.println("不同的单词为:"+hs);
		System.out.print("重复的单词为:"+se);
	}
}

其中D:\\out.txt的测试文本为

wqdsb

发布了46 篇原创文章 · 获赞 19 · 访问量 5万+

私信关注

java统计文本中的不同的单词和重复的单词

猜你喜欢