Java check string for special format when format contains loops

BRHSM :

Introduction

I'm working on a project where a user is able to enter facts and rules in a special format but I'm having some trouble with checking if that format is correct and obtaining the information.

When the program is launched the user can enter "commands" into a textarea and this text is send to a parseCommand method which determine what to do based on what the user has written. For example to add a fact or a rule you can use the prefix +. or use - to remove a fact or rule and so on.

I've created the system which handles the prefix but i'm having trouble with the facts and rules format.

Facts and rules

Facts: These are defined by an alphanumeric name and contain a list of properties (each is withing <> signs) and a truth value. Properties are also defined by an alphanumeric name and contain 2 strings (called arguments), again each is withing <> signs. Properties can also be negative by placing an ! before it in the list. For example the user could type the following to add these 3 facts to the program:

+father(<parent(<John>,<Jake>)>, true)

+father(<parent(<Jammie>,<Jake>)>, false)

+father(!<parent(<Jammie>,<Jake>)>, true)

+familyTree(<parent(<John>,<Jake>)>, <parent(<Jammie>,<Jake>)> , true)

+fathers(<parent(<John>,<Jake>)>, !<parent(<Jammie>,<Jake>)> , true)

The class I use to store facts is like this:

public class Fact implements Serializable{

    private boolean truth;
    private ArrayList<Property> properties;
    private String name;

    public Fact(boolean truth, ArrayList<Property> properties, String name){
        this.truth = truth;
        this.properties = properties;
        this.name = name;
    }
    //getters and setters here...
}

Rules: These are links between 2 properties and they are identified by the => sign Again their name is alphanumeric. The properties are limited though as they can only have arguments made up of uppercase letters and the arguments of the second property have to be the same as those the first one. Rules also have 2 other arguments which are either set or not set by entering the name or not (each of these arguments correspond with a property for the rule which can be Negative or Reversive). for example:

+son(<parent(<X>,<Y>)> => <child(<Y>,<X>)>)

+son(<parent(<X>,<Y>)> => <child(<Y>,<X>)>, Negative, Reversive)

+son(<parent(<X>,<Y>)> => <child(<Y>,<X>)>, Reversive)

+son(<parent(<X>,<Y>)> => <child(<Y>,<X>)>, Negative)

Rule Properties

A normal rule tells us that if, in the example below, X is a parent of Y this implies that Y is a child of X :

son(<parent(<X>,<Y>)> => <child(<Y>,<X>)>)

While a Negative rule tells us that if, in the example below, X is a parent of Y this implies that Y is not a child of X :

son(<parent(<X>,<Y>)> => <child(<Y>,<X>)>, Negtive)

A Reversive rule however tells us that if, in the example below, Y is a child of X this implies that X is a parent of Y

son(<parent(<X>,<Y>)> => <child(<Y>,<X>)>, Reversive)

The last case is when the rule is both Negative and Reversive. This tells us that if, in the example below, Y is not a child of X this implies that X is a parent of Y.

son(<parent(<X>,<Y>)> => <child(<Y>,<X>)>, Negative, Reversive)

This is the class I use to store rules:

public class Rule implements Serializable{

    private Property derivative;
    private Property impliant;
    private boolean negative;
    private boolean reversive;
    private String name;

    public Rule(Property derivative, Property impliant, boolean negative, boolean reversive) throws InvalidPropertyException{
        if(!this.validRuleProperty(derivative) || !this.validRuleProperty(impliant))
            throw new InvalidPropertyException("One or more properties are invalid");
        this.derivative = derivative;
        this.impliant = impliant;
        this.negative = negative;
        this.reversive = reversive;
    }
    //getters and setters here
}

Property class:

public class Property implements Serializable{

    private String name;
    private String firstArgument;
    private String secondArgument;

    public Property(String name, String firstArgument, String secondArgument){
        this.name = name;
        this.firstArgument = firstArgument;
        this.secondArgument = secondArgument;
    }

The above examples are all valid inputs. Just to clarify here are some invalid input examples:

Facts:

No true or false is provided for the argument:

+father(<parent(<John>,<Jake>)>)

No property given:

+father(false)

An invalid property is provided:

+father(<parent(<John>)>, true) 

+father(<parent(John, Jake)>, true) 

+father(<parent(John, Jake, Michel)>, true) 

+father(parent(<John>,<Jake>), true)

Note the missing bracket in the last one.

Rules:

One or more properties are invalid:

+son(<parent(<X>,<Y>)> => child(<Y>,<X>))

+son(parent(<X>,<Y>) => child(<Y>,<X>))

+son(<parent(<X>,<Y>)> => <child(<Z>,<X>)>) (Note the Z in the child property)

+son(<parent(<Not Valid>,<Y>)> => child(<Y>,<X>)) (Invalid argument for first property)

+son(=> child(<Y>,<X>))

The problem

I'm able to get the input from the user and I'm also able to see which kind of action the user wants to preform based on the prefix.

However I'm not able to figure out how to process strings like:

+familyTree(<parent(<John>,<Jake>)>, <parent(<Jammie>,<Jake>)> , true)

This is due to a number of reasons:

The number of properties for a fact entered by the user is variable so I cant just split the input string based on the () and <> signs.
For rules, sometimes, the last 2 properties are variable so it can happen that the 'Reversive' property is on the place in the string where you would normally find the Negative property.
If I want to get arguments from this part of the input string: +familyTree(<parent(<John>,<Jake>)>, to setup the property for this fact I can check for anything that is in between <> that might form a problem because there are 2 opening < before the first >

What I've tried

My first idea was to start at the beginning of the string (which I did for getting the action from the prefix) and then remove that piece of string from the main string.

However I don't know how to adapt this system to the problems above(specially problem number 1 and 2).

I've tried to use functions like: String.split() and String.contains().

How would I go about doing this? How can I get arount the fact that not all strings contain the same information? (In a sense that some facts have more properties or some rules have more attributes than others.)

EDIT:

I forgot to say that all the methods used to store the data are finished and work and they can be used by calling for example: infoHandler.addRule() or infoHandler.removeFact(). Inside these functions I could also validate input data if this is better.

I could, for example, just obtain all data of the fact or rule from the string and validate things like are the arguments of the properties of rules only using uppercase letters and so on.

EDIT 2:

In the comments someone has suggested using a parser generator like ANTLR or JavaCC. I'e looked into that option in the last 3 days but I can't seem to find any good source on how to define a custom language in it. Most documentation assumes you're trying to compile an exciting language and recommend downloading the language file from somewhere instead of writing your own.

I'm trying to understand the basics of ANTLR (which seems to be the one which is easiest to use.) However there is not a lot of recources online to help me.

If this is a viable option, could anyone help me understand how to do something like this in ANTLR?

Also once I've written a grammer file how am I sopposed to use it? I've read something about generating a parser from the language file but I cant seem to figure out how that is done...

EDIT 3:

I've begon to work on a grammer file for ANTLR which looks like this:

/** Grammer used by communicate parser */

grammar communicate;


/*
 * Parser Rules
 */

argument            : '<' + NAMESTRING + '>' ;

ruleArgument        : '<' + RULESTRING + '>' ;

property            : NAMESTRING + '(' + argument + ',' + argument + ')' ;

propertyArgument    : (NEGATIVITY | POSITIVITY) + property + '>' ;

propertyList        : (propertyArgument + ',')+ ;

fact                : NAMESTRING + '(' + propertyList + ':' + (TRUE | FALSE) + ')';

rule                : NAMESTRING + '(' + ruleArgument + '=>' + ruleArgument + ':' + RULEOPTIONS + ')' ;

/*
 * Lexer Rules
 */

fragment LOWERCASE  : [a-z] ;
fragment UPPERCASE  : [A-Z] ;

NAMESTRING          : (LOWERCASE | UPPERCASE)+ ;

RULESTRING          : (UPPERCASE)+ ;

TRUE                : 'True';

FALSE               : 'False';

POSITIVITY          : '!<';

NEGATIVITY          : '<' ;

NEWLINE             : ('\r'? '\n' | '\r')+ ;

RULEOPTIONS         : ('Negative' | 'Negative' + ',' + 'Reversive' | 'Reversive' );

WHITESPACE          : ' ' -> skip ;

Am I on the right track here? If this is a good grammer file how can I test and use it later on?

Hekmatof :

I dont think a syntax analyzer is good for your problem. anyway you can handle it simpler by using regex and some string utilities.

It's better to start from small problem and move to the bigger ones: first parsing the property itself seems easy so we write a method to do that:

 private static Property toProp(String propStr) {
    String name = propStr.substring(1,propStr.indexOf("("));
    String[] arguments = propStr.substring(propStr.indexOf('(')+1,propStr.indexOf(')')).split(",");
    return new Property(name,
            arguments[0].substring(1,arguments[0].length()-1),
            arguments[1].substring(1,arguments[1].length()-1));
  }

To parse Fact string, using regex make things easier,regex for property is /<[\w\d]([<>\w\d,])>/ and by the help of toProp method we have written already we can create another method to parse Facts:

public static Fact handleFact(String factStr) {
    Pattern propertyPattern = Pattern.compile("<[\\w\\d]*\\([<>\\w\\d,]*\\)>");
    int s = factStr.indexOf("(") + 1;
    int l = factStr.lastIndexOf(")");
    String name = factStr.substring(0,s-1);
    String params = factStr.substring(s, l);
    Matcher matcher = propertyPattern.matcher(params);
    List<Property> props  = new ArrayList<>();
    while(matcher.find()){
      String propStr = matcher.group();
      props.add(toProp(propStr));
    }
    String[] split = propertyPattern.split(params);
    boolean truth = Boolean.valueOf(split[split.length-1].replaceAll(",","").trim());
    return new Fact(truth,props,name);
  }

Parsing rules is very similar to facts:

 private static Rule handleRule(String ruleStr) {
    Pattern propertyPattern = Pattern.compile("<[\\w\\d]*\\([<>\\w\\d,]*\\)>");
    String name = ruleStr.substring(0,ruleStr.indexOf('('));
    String params = ruleStr.substring(ruleStr.indexOf('(') + 1, ruleStr.lastIndexOf(')'));
    Matcher matcher = propertyPattern.matcher(params);
    if(!matcher.find())
      throw new IllegalArgumentException();
    Property prop1 = toProp(matcher.group());
    if(!matcher.find())
      throw new IllegalArgumentException();
    Property prop2 = toProp(matcher.group());
    params = params.replaceAll("<[\\w\\d]*\\([<>\\w\\d,]*\\)>","").toLowerCase();
    return new Rule(name,prop1,prop2,params.contains("negative"),params.contains("reversive"));
  }