Boosting performance of bulk HTTP REST calls paralleling GET methods invocation

Gleb Kosteiko :

In application I develop I need to execute huge amount of REST calls. The architecture of REST API resources that I need to interact to is hierarchical and looks like:

/api/continents - return list of all Earth's continents
/api/continents/{continent_name}/countries - return list of all countries on mentioned continent
/api/continents/{continent_name}/countries/{country_name}/cities - return list of all cities in mentioned country

Unfortunately this API don't provide any methods to get already all cities and I need first to get list of all continent, after that to get list of all countries for each continent, and after that to get list of all cities for each country of each continent.

First I tried to implement my method for getting all cities from that API without parallelization only with consecutive calls. Something like that:

private List<City> getCities() {
    List<Continent> continents = getAllContinents(); //HTTP GET call
    List<Country> countries = new ArrayList<>();
    for (Continent continent: continents) {
        countries.addAll(getAllCountriesOfContinent(continent));
    }
    List<City> cities = new ArrayList<>();
    for (Country country : countries) {
        cities.addAll(getAllCitiesOfCountry(country));
    }
    return cities;
}

But such approach worked too slow (in concrete numbers it executed about 7 hours). I decided to try to improve it using Java Parallel Streams and CompletableFuture and got such methods:

private List<City> getCities() {
    return getAllContinents()
        .parallelStream()
        .map(continent -> getAllCountriesOfContinent(continent))
        .flatMap(feature -> feature.join().parallelStream())
        .map(country -> getAllCitiesOfCountry(country))
        .flatMap(feature -> feature.join().parallelStream())
        .collect(Collectors.toList());
}

Where getAllCountriesOfContinent and getAllCitiesOfCountry methods returned lists of CompletableFuture and looked like:

private CompletableFuture<List<Country>> getAllCountriesOfContinent(Continent continent) {
    return CompletableFuture.supplyAsync(() -> {
        return restClient.getDataFromApi(continent);
    });
}

private CompletableFuture<List<City>> getAllCitiesOfCountry(Country country) {
    return CompletableFuture.supplyAsync(() -> {
        return restClient.getDataFromApi(country);
    });
}

With such refactoring I got nice performance boost (it executed about 25-30 minutes). But I think that I could improve it more using Java ThreadPoolExecutors and Threads or ForkJoin framework. Will such approaches help me to boost performance of my code or there are some other special techniques/algorithms/frameworks for that?

GhostCat salutes Monica C. :

Will such approaches help me to boost performance?

The answer is: probably.

You see, parallelStream() gives you a "default" implementation of multi threading (and under the covers, this operation actually uses the ForkJoin framework).

In other words: you can always step back, and invest many hours of time to make experiments, where you use different low level approaches, and measure the corresponding results. And yes, most likely, when you spend 1 week fine tuning your algorithms, you should be able to end up with something that is better than relying on the "default implementations" that Java has to offer.

But how much of an improvement you got, and how long it will take you to get there, that is very hard to predict.

Thus, the real answer would be:

  • to measure which operation takes how long, to identify your real bottle necks in your overall system (like: should a typical client use one thread per country, to fetch these cities, or would a smaller number of threads be more helpful)
  • if possible, get that REST API enhanced to simply give you a list of cities right there

Long story short: you have to make a trade off. You can write a lot of custom code to get to better results. But nobody can tell you upfront about the gains you will make, and how much "cost" will be added to your "budget" because "writing and maintaining more complicated code over time".

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=148129&siteId=1