I've been struggling to figure out how to get a word, of unknown length, from a string, of unknown length, that I'm reading from a file. The words I want from the string are always separated by "." and/or "&" with the whole string being surrounded by quotes. EX: ".Word.Characters&Numeric&Letters.Typos&Mistypes." I know the location of each "." and "&" as well as how many times they occur.
I want to feed the words into an array Example[i][j] based on whether or not the words are separated by a "." or a "&". So words contained between "." would be set into the i column of the array and words linked by "&" into the j rows of the array.
The input string can contain a largely variable number of words. Meaning that there can be only one word of interest, or one hundred+.
I'd prefer to use arrays to solve this problem. From what I've read regex would be slow, but work. split() may also work, but I think I'd have to know what words to look for before hand.
From this String: ".Word.Characters&Numeric&Letters.Typos&Mistypes." I'd expect to get: (without worrying about which is a row or column)
[[Word],[null],[null]],
[[Characters],[Numbers],[Letters]],
[[Typos],[Mistypes],[null]]
From this String ".Alpha.Beta.Zeta&Iota." I'd expect to get:
[[Alpha],[null]],
[[Beta],[null]],
[[Zeta],[Iota]]
//NumerOfPeriods tells me how many word "sections" are in the string
//Stor[] is an array that holds the string index locations of "."
for(int i=0;i<NumberOfPeriods;i++)
{
int length = Stor[i];
while(Line.charAt(length) != '"')
{
length++;
}
Example[i] = Line.substring(Stor[i], length);
}
//This code can get the words separated by "." but not by "&"
//Stor[] is an array that holds all string index locations of '.'
//AmpStor[] is an array that holds all string index locations of '&'
int TotalLength = Stor[0];
int InnerLength = 0;
int OuterLength = 0;
while(Line.charAt(TotalLength) != '"')
{
while(Line.charAt(OuterLength)!='.')
{
while(Line.charAt(InnerLength)!='&')
{
InnerLength++;
}
if(Stor[i] > AmpStor[i])
{
Example[i][j] = Line.substring(Stor[i], InnerLength);
}
if(Stor[i] < AmpStor[i])
{
Example[i][j] = Line.substring(AmpStor[i],InnerLength);
}
OuterLength++;
}
}
//Here I run into the issue of indexing into different parts of the array i & j
This is how I would solve your problem (it's completely different from your code but it works).
First of all, remove the quotes and the leading and trailing non-word characters. This can be done using replaceAll
:
String Formatted = Line.replaceAll( "(^\"[.&]*)|([.&]*\"$)", "" );
The regular expression in the first argument will match the double quotes at both ends and the leading and trailing .
s and &
s. The method will return a new string where the matched characters are removed, because the second argument is an empty string (it replaces with an empty string).
Now you can split this string at each .
using the split
method. You could only define your output array after this call:
String[] StringGroups = Formatted.split( "\\." );
String[][] Elements = new String[StringGroups.length][];
Use an escaped backslash (\\
) before the point to indicate that it should split on .
-characters, since this method takes in a regular expression (and just .
splits on any non-newline character).
Now split each string in that array at each &
using the same split
method. Add the result directly to your Elements
array:
// Loop over the array
int MaxLength = 0;
for( int i = 0; i < StringGroups.length; i ++ ) {
String StrGroup = StringGroups[ i ];
String[] Group = StrGroup.split( "&" );
Elements[ i ] = Group;
// Measure the max length
if( Group.length > MaxLength ) {
MaxLength = Group.length;
}
}
A \\
is not necessary for the input, since &
just matches &
-characters. Now you only have to fill in your data into an array. The MaxLength
variable is for adding the null
values to your array. If you don't want them, just remove them and you're done here.
If you want the null
values however, loop over your elements array and copy the current rows into new arrays:
for( int i = 0; i < Elements.length; i ++ ) {
String[] Current = Elements[ i ];
String[] New = new String[ MaxLength ];
// Copy existing values into new array, extra values remain null
System.arraycopy( Current, 0, New, 0, Current.length );
Elements[ i ] = New;
}
Now, the Elements
array contains exactly what you wanted.
Here is the complete executable code:
public class StringSplitterExample {
public static void main( String[] args ) {
test( "\".Word.Characters&Numeric&Letters.Typos&Mistypes.\"" );
System.out.println(); // Line between
test( "\".Alpha.Beta.Zeta&Iota.\"" );
}
public static void test( String Line ) {
String Formatted = Line.replaceAll( "(^\"[.&]*)|([.&]*\"$)", "" );
String[] StringGroups = Formatted.split( "\\." );
String[][] Elements = new String[StringGroups.length][];
// Loop over the array
int MaxLength = 0;
for( int i = 0; i < StringGroups.length; i ++ ) {
String StrGroup = StringGroups[ i ];
String[] Group = StrGroup.split( "&" );
Elements[ i ] = Group;
// Measure the max length
if( Group.length > MaxLength ) {
MaxLength = Group.length;
}
}
for( int i = 0; i < Elements.length; i ++ ) {
String[] Current = Elements[ i ];
String[] New = new String[ MaxLength ];
// Copy existing values into new array, extra values remain null
System.arraycopy( Current, 0, New, 0, Current.length );
Elements[ i ] = New;
}
for( String[] Group : Elements ) {
for( String String : Group ) {
System.out.print( String );
System.out.print( " " );
}
System.out.println();
}
}
}
The output of this example:
Word null null Characters Numeric Letters Typos Mistypes null Alpha null Beta null Zeta Iota
So this works, and you don't even need to know where the .
and &
characters are in your string. Java will just do that for you.