Thursday, February 22, 2024

The Efficiency of Numerous To-Many Nesting Algorithms – Java, SQL and jOOQ.


It’s been some time since jOOQ 3.15 has been launched with its revolutionary normal SQL MULTISET emulation function. A factor that has been lengthy overdue and which I promised on twitter a number of occasions is to run a number of benchmarks evaluating the efficiency of varied approaches to nesting to-many relationships with jOOQ.

This text will present, that in some actual world situations on modest knowledge set sizes, jOOQ’s MULTISET emulation performs about in addition to

  • manually working single be a part of queries and manually deduplicating outcomes
  • manually working a number of queries per nest stage and matching leads to the shopper

In distinction, the entire above carry out significantly better than the dreaded N+1 “method” (or reasonably, accident), all of the whereas being far more readable and maintainable.

The conclusion is:

  • For jOOQ customers to freely use MULTISET at any time when smallish knowledge units are used (i.e. a nested loop be a part of could be OK, too)
  • For jOOQ customers to make use of MULTISET rigorously the place large-ish knowledge units are used (i.e. a hash be a part of or merge be a part of could be higher, e.g. in studies)
  • For ORM distributors to choose the a number of queries per nest stage method in case they’re in full management of their SQL to materialise predefined object graphs

Benchmark thought

As all the time, we’re querying the well-known Sakila database. There are two sorts of queries that I’ve examined on this benchmark.

A question that double-nests little one collections (DN = DoubleNesting)

The outcome will likely be of the shape:

file DNCategory (String identify) {}
file DNFilm (lengthy id, String title, Listing<DNCategory> classes) {}
file DNName (String firstName, String lastName) {}
file DNActor (lengthy id, DNName identify, Listing<DNFilm> movies) {}

So, the outcome will likely be actors and their movies and their classes per movie. If a single be a part of is being executed, this could trigger a whole lot of duplication within the knowledge (though regrettably, in our check knowledge set, every movie solely has a single class)

A question that nests two little one collections in a single dad or mum (MCC = A number of Youngster Collections)

The outcome will likely be of the shape:

file MCCName (String firstName, String lastName) {}
file MCCActor (lengthy id, MCCName identify) {}
file MCCCategory (String identify) {}
file MCCFilm (
    lengthy id, 
    String title, 
    Listing<MCCActor> actors, 
    Listing<MCCCategory> classes
) {}

So, the outcome will likely be movies and their actors in addition to their classes. That is laborious to deduplicate with a single be a part of, due to the cartesian product between ACTOR × CATEGORY. However different approaches with a number of queries nonetheless work, in addition to MULTISET, in fact, which would be the most handy choice

Information set measurement

Along with the above distinction of use-cases, the benchmark may even attempt to pull in both:

  • The whole knowledge set (we’ve 1000 FILM entries in addition to 200 ACTOR entries, so not an enormous knowledge set), the place hash joins are usually higher
  • Solely the subset for both ACTOR_ID = 1 or FILM_ID = 1, respectively, the place nested loop joins are usually higher

The expectation right here is {that a} JOIN tends to carry out higher on bigger outcome units, as a result of the RDBMS will choose a hash be a part of algorithm. It’s unlikely the MULTISET emulation will be remodeled right into a hash be a part of or merge be a part of, provided that it makes use of JSON_ARRAYAGG which could be tough to remodel into one thing completely different, which continues to be equal.

Benchmark exams

The next issues will likely be benchmarked for every mixture of the above matrix:

  1. A single MULTISET question with its 3 obtainable emulations utilizing XML (the place obtainable), JSON, JSONB
  2. A single JOIN question that creates a cartesian product between dad or mum and kids
  3. An method that runs 2 queries fetching all the required knowledge into shopper reminiscence, and performs the nesting within the shopper, thereafter
  4. A naive N+1 “shopper aspect nested loop be a part of,” which is horrible however not unlikely to occur in actual world shopper code, both with jOOQ (much less seemingly, however nonetheless potential), or with a lazy loading ORM (extra seemingly, as a result of “unintentional”)

The complete benchmark logic will likely be posted on the finish of this text.

1. Single MULTISET question (DN)

The question seems like this:

return state.ctx.choose(
    ACTOR.ACTOR_ID,

    // Nested file for the actor identify
    row(
        ACTOR.FIRST_NAME,
        ACTOR.LAST_NAME
    ).mapping(DNName::new),

    // First stage nested assortment for movies per actor
    multiset(
        choose(
            FILM_ACTOR.FILM_ID,
            FILM_ACTOR.movie().TITLE,

            // Second stage nested assortment for classes per movie
            multiset(
                choose(FILM_CATEGORY.class().NAME)
                .from(FILM_CATEGORY)
                .the place(FILM_CATEGORY.FILM_ID.eq(FILM_ACTOR.FILM_ID))
            ).convertFrom(r -> r.map(mapping(DNCategory::new)))
        )
        .from(FILM_ACTOR)
        .the place(FILM_ACTOR.ACTOR_ID.eq(ACTOR.ACTOR_ID))
    ).convertFrom(r -> r.map(mapping(DNFilm::new))))
.from(ACTOR)

// Both fetch all knowledge or filter ACTOR_ID = 1
.the place(state.filter ? ACTOR.ACTOR_ID.eq(1L) : noCondition())
.fetch(mapping(DNActor::new));

To be taught extra about the precise MULTISET syntax and the ad-hoc conversion function, please check with earlier weblog posts explaining the main points. The identical is true for the implicit JOIN function, which I’ll be utilizing on this submit to maintain SQL a bit extra terse.

2. Single JOIN question (DN)

We are able to do every part with a single be a part of as nicely. On this instance, I’m utilizing a purposeful fashion to remodel the flat outcome into the doubly nested assortment in a sort secure manner. It’s a bit quirky, maybe there are higher methods to do that with non-JDK APIs. Since I wouldn’t anticipate this to be efficiency related, I feel it’s adequate:

// The question is simple. Simply be a part of every part from
// ACTOR -> FILM -> CATEGORY by way of the connection tables
return state.ctx.choose(
    FILM_ACTOR.ACTOR_ID,
    FILM_ACTOR.actor().FIRST_NAME,
    FILM_ACTOR.actor().LAST_NAME,
    FILM_ACTOR.FILM_ID,
    FILM_ACTOR.movie().TITLE,
    FILM_CATEGORY.class().NAME)
.from(FILM_ACTOR)
.be a part of(FILM_CATEGORY).on(FILM_ACTOR.FILM_ID.eq(FILM_CATEGORY.FILM_ID))
.the place(state.filter ? FILM_ACTOR.ACTOR_ID.eq(1L) : noCondition())

// Now comes the difficult half. We first use JDK Collectors to group
// outcomes by ACTOR
.gather(groupingBy(
    r -> new DNActor(
        r.value1(), 
        new DNName(r.value2(), r.value3()), 

        // dummy FILM record, we won't simply gather them right here, but
        null 
    ),

    // For every actor, produce an inventory of FILM, once more with a dummy
    // CATEGORY record as we won't gather them right here but
    groupingBy(r -> new DNFilm(r.value4(), r.value5(), null))
))

// Set<Entry<DNActor, Map<DNFilm, Listing<Record6<...>>>>> 
.entrySet()
.stream()

// Re-map the DNActor file into itself, however this time, add the
// nested DNFilm record.
.map(a -> new DNActor(
    a.getKey().id(),
    a.getKey().identify(),
    a.getValue()
     .entrySet()
     .stream()
     .map(f -> new DNFilm(
         f.getKey().id(),
         f.getKey().title(),
         f.getValue().stream().map(
             c -> new DNCategory(c.value6())
         ).toList()
     ))
     .toList()
))
.toList();

Probably, this instance may very well be improved to keep away from the dummy assortment placeholders within the first gather() name, though that will most likely require extra file sorts or structural tuple sorts like these from jOOλ. I stored it “easy” for this instance, although I’ll take your options within the feedback.

3. Two queries merged in reminiscence (DN)

A superbly advantageous answer is to run a number of queries (however not N+1 queries!), i.e. one question per stage of nesting. This isn’t all the time potential, or optimum, however on this case, there’s an inexpensive answer.

I’m spelling out the prolonged Record5<...> sorts to indicate the precise sorts on this weblog submit. You should use var to revenue from kind inference, in fact. All of those queries use Document.value5() and comparable accessors to revenue from index primarily based entry, simply to be truthful, stopping the sphere lookup, which isn’t mandatory within the benchmark.

// Simple question to get ACTORs and their FILMs
Outcome<Record5<Lengthy, String, String, Lengthy, String>> actorAndFilms =
state.ctx
    .choose(
        FILM_ACTOR.ACTOR_ID,
        FILM_ACTOR.actor().FIRST_NAME,
        FILM_ACTOR.actor().LAST_NAME,
        FILM_ACTOR.FILM_ID,
        FILM_ACTOR.movie().TITLE)
    .from(FILM_ACTOR)
    .the place(state.filter ? FILM_ACTOR.ACTOR_ID.eq(1L) : noCondition())
    .fetch();

// Create a FILM.FILM_ID => CATEGORY.NAME lookup
// That is simply fetching all of the movies and their classes.
// Optionally, filter for the earlier FILM_ID record
Map<Lengthy, Listing<DNCategory>> categoriesPerFilm = state.ctx
    .choose(
        FILM_CATEGORY.FILM_ID,
        FILM_CATEGORY.class().NAME)
    .from(FILM_CATEGORY)
    .the place(state.filter
        ? FILM_CATEGORY.FILM_ID.in(actorAndFilms.map(r -> r.value4()))
        : noCondition())
    .gather(intoGroups(
        r -> r.value1(),
        r -> new DNCategory(r.value2())
    ));

// Group once more by ACTOR and FILM, utilizing the earlier dummy
// assortment trick
return actorAndFilms
    .gather(groupingBy(
        r -> new DNActor(
            r.value1(), 
            new DNName(r.value2(), r.value3()), 
            null
        ),
        groupingBy(r -> new DNFilm(r.value4(), r.value5(), null))
    ))
    .entrySet()
    .stream()

    // Then change the dummy collections
    .map(a -> new DNActor(
        a.getKey().id(),
        a.getKey().identify(),
        a.getValue()
         .entrySet()
         .stream()
         .map(f -> new DNFilm(
             f.getKey().id(),
             f.getKey().title(),

             // And use the CATEGORY per FILM lookup
             categoriesPerFilm.get(f.getKey().id())
         ))
         .toList()
    ))
    .toList();

Whew. Unwieldy. Actually, the MULTISET method is to be most popular from a readability perspective? All this mapping to middleman structural knowledge sorts will be heavy, particularly in the event you make a typo and the compiler journeys.

4. N+1 queries (DN)

This naive answer is hopefully not what you’re doing largely in manufacturing, however we’ve all achieved it in some unspecified time in the future (sure, responsible!), so right here it’s. Not less than, the logic is extra readable than the earlier ones, it’s as simple as the unique MULTISET instance, in actual fact, as a result of it does nearly the identical factor because the MULTISET instance, however as an alternative of doing every part in SQL, it correlates the subqueries within the shopper:

// Fetch all ACTORs
return state.ctx
    .choose(ACTOR.ACTOR_ID, ACTOR.FIRST_NAME, ACTOR.LAST_NAME)
    .from(ACTOR)
    .the place(state.filter ? ACTOR.ACTOR_ID.eq(1L) : noCondition())
    .fetch(a -> new DNActor(
        a.value1(),
        new DNName(a.value2(), a.value3()),

        // And for every ACTOR, fetch all FILMs
        state.ctx
            .choose(FILM_ACTOR.FILM_ID, FILM_ACTOR.movie().TITLE)
            .from(FILM_ACTOR)
            .the place(FILM_ACTOR.ACTOR_ID.eq(a.value1()))
            .fetch(f -> new DNFilm(
                f.value1(),
                f.value2(),

                // And for every FILM, fetch all CATEGORY-s
                state.ctx
                    .choose(FILM_CATEGORY.class().NAME)
                    .from(FILM_CATEGORY)
                    .the place(FILM_CATEGORY.FILM_ID.eq(f.value1()))
                    .fetch(r -> new DNCategory(r.value1()))
            ))
    ));

1. Single MULTISET question (MCC)

Now, we’ll repeat the train once more to gather knowledge right into a extra tree like knowledge construction, the place the dad or mum kind has a number of little one collections, one thing that’s far more tough to do with JOIN queries. Piece of cake with MULTISET, which nests the collections straight in SQL:

return state.ctx
    .choose(
        FILM.FILM_ID,
        FILM.TITLE,

        // Get all ACTORs for every FILM
        multiset(
            choose(
                FILM_ACTOR.ACTOR_ID,
                row(
                    FILM_ACTOR.actor().FIRST_NAME,
                    FILM_ACTOR.actor().LAST_NAME
                ).mapping(MCCName::new)
            )
            .from(FILM_ACTOR)
            .the place(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
        ).convertFrom(r -> r.map(mapping(MCCActor::new))),

        // Get all CATEGORY-s for every FILM
        multiset(
            choose(FILM_CATEGORY.class().NAME)
            .from(FILM_CATEGORY)
            .the place(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
        ).convertFrom(r -> r.map(mapping(MCCCategory::new))))
    .from(FILM)
    .the place(state.filter ? FILM.FILM_ID.eq(1L) : noCondition())
    .fetch(mapping(MCCFilm::new));

Once more, to be taught extra about the precise MULTISET syntax and the ad-hoc conversion function, please check with earlier weblog posts explaining the main points. The identical is true for the implicit JOIN function, which I’ll be utilizing on this submit to maintain SQL a bit extra terse.

2. Single JOIN Question (MCC)

One of these nesting could be very laborious to do with a single JOIN question, as a result of there will likely be a cartesian product between ACTOR and CATEGORY, which can be laborious to deduplicate after the actual fact. On this case, it might be potential, as a result of we all know that every ACTOR is listed solely as soon as per FILM, and so is every CATEGORY. However what if this wasn’t the case? It may not be potential to appropriately take away duplicates, as a result of we wouldn’t have the ability to distinguish:

  • Duplicates originating from the JOIN cartesian product
  • Duplicates originating from the underlying knowledge set

As it’s laborious (most likely not unimaginable) to ensure correctness, it’s futile to check efficiency right here.

3. Two queries merged in reminiscence (MCC)

That is once more fairly an inexpensive implementation of this type of nesting utilizing extraordinary JOIN queries.

It’s most likely what most ORMs do that don’t but help MULTISET like assortment nesting. It’s completely cheap to make use of this method when the ORM is in full management of the generated queries (e.g. when fetching pre-defined object graphs). However when permitting customized queries, this method gained’t work nicely for advanced queries. For instance, JPQL’s JOIN FETCH syntax might use this method behind the scenes, however this prevents JPQL from supporting non-trivial queries the place JOIN FETCH is utilized in derived tables or correlated subqueries, and itself joins derived tables, and many others. Right me if I’m incorrect, however I feel that appears to be extremely laborious to get proper, to remodel advanced nested queries into a number of particular person queries which can be executed one after the opposite, solely to then reassemble outcomes.

In any case, it’s an method that works nicely for ORMs who’re in command of their SQL, however is laborious for finish customers to implement manually.

// Simple question to get ACTORs and their FILMs
Outcome<Record5<Lengthy, String, Lengthy, String, String>> filmsAndActors = 
state.ctx
    .choose(
        FILM_ACTOR.FILM_ID,
        FILM_ACTOR.movie().TITLE,
        FILM_ACTOR.ACTOR_ID,
        FILM_ACTOR.actor().FIRST_NAME,
        FILM_ACTOR.actor().LAST_NAME)
    .from(FILM_ACTOR)
    .the place(state.filter ? FILM_ACTOR.FILM_ID.eq(1L) : noCondition())
    .fetch();

// Create a FILM.FILM_ID => CATEGORY.NAME lookup
// That is simply fetching all of the movies and their classes.
Map<Lengthy, Listing<MCCCategory>> categoriesPerFilm = state.ctx
    .choose(
        FILM_CATEGORY.FILM_ID,
        FILM_CATEGORY.class().NAME)
    .from(FILM_CATEGORY)
    .the place(FILM_CATEGORY.FILM_ID.in(
        filmsAndActors.map(r -> r.value1())
    ))
    .and(state.filter ? FILM_CATEGORY.FILM_ID.eq(1L) : noCondition())
    .gather(intoGroups(
        r -> r.value1(),
        r -> new MCCCategory(r.value2())
    ));

// Group once more by ACTOR and FILM, utilizing the earlier dummy
// assortment trick
return filmsAndActors
    .gather(groupingBy(
        r -> new MCCFilm(r.value1(), r.value2(), null, null),
        groupingBy(r -> new MCCActor(
            r.value3(), 
            new MCCName(r.value4(), r.value5())
        ))
    ))
    .entrySet()
    .stream()

    // This time, the nesting of CATEGORY-s is less complicated as a result of
    // we do not have to nest them once more deeply
    .map(f -> new MCCFilm(
        f.getKey().id(),
        f.getKey().title(),
        new ArrayList<>(f.getValue().keySet()),
        categoriesPerFilm.get(f.getKey().id())
    ))
    .toList();

As you cam see, it nonetheless seems like a chore to do all of this grouping and nesting manually, ensuring all of the intermediate structural sorts are right, however at the least the MCC case is a bit less complicated than the earlier DN case as a result of the nesting is much less deep.

However everyone knows, we’ll finally mix the approaches and nest tree constructions of arbitrary complexity.

4. N+1 queries (MCC)

Once more, don’t do that at residence (or in manufacturing), however we’ve all been there and right here’s what a whole lot of purposes do both explicitly (disgrace on the creator!), or implicitly (disgrace on the ORM for permitting the creator to place disgrace on the creator!)

// Fetch all FILMs
return state.ctx
    .choose(FILM.FILM_ID, FILM.TITLE)
    .from(FILM)
    .the place(state.filter ? FILM.FILM_ID.eq(1L) : noCondition())
    .fetch(f -> new MCCFilm(
        f.value1(),
        f.value2(),

        // For every FILM, fetch all ACTORs
        state.ctx
            .choose(
                FILM_ACTOR.ACTOR_ID,
                FILM_ACTOR.actor().FIRST_NAME,
                FILM_ACTOR.actor().LAST_NAME)
            .from(FILM_ACTOR)
            .the place(FILM_ACTOR.FILM_ID.eq(f.value1()))
            .fetch(a -> new MCCActor(
                a.value1(),
                new MCCName(a.value2(), a.value3())
            )),

        // For every FILM, fetch additionally all CATEGORY-s
        state.ctx
            .choose(FILM_CATEGORY.class().NAME)
            .from(FILM_CATEGORY)
            .the place(FILM_CATEGORY.FILM_ID.eq(f.value1()))
            .fetch(c -> new MCCCategory(c.value1()))
    ));

Algorithmic complexities

Earlier than we transfer on to the benchmark outcomes, please be very cautious along with your interpretation, as all the time.

The purpose of this benchmark wasn’t to discover a clear winner (or put disgrace on a clear loser). The purpose of this benchmark was to verify if the MULTISET method has any important and apparent profit and/or disadvantage over the opposite, extra handbook and unwieldy approaches.

Don’t conclude that if one thing is 1.5x or 3x sooner than one thing else, that it’s higher. It could be on this case, nevertheless it is probably not in several circumstances, e.g.

  • When the info set is smaller
  • When the info set is larger
  • When the info set is distributed in a different way (e.g. many extra classes per movie, or a much less common variety of movies per actor (sakila knowledge units have been generated reasonably uniformly))
  • When switching distributors
  • When switching variations
  • When you’ve extra load on the system
  • When your queries are extra various (benchmarks are likely to run solely single queries, which tremendously revenue from caching within the database server!)

So, once more, as with each benchmark outcome, be very cautious along with your interpretation.

The N+1 case

Even the N+1 case, which might flip into one thing horrible isn’t all the time the incorrect selection.

As we all know from Large O Notation, issues with unhealthy algorithmic complexities seem solely when N is huge, not when it’s small.

  • The algorithmic complexity of a single nested assortment is O(N * log M), i.e. N occasions wanting up values in an index for M values (assuming there’s an index)
  • The algorithmic complexity of a doubly nested assortment, nonetheless, is far worse, it’s O(N * log M * ? * log L), i.e. N occasions wanting up values in an index for M values, after which ? occasions (depends upon the distribution) wanting up values in an index for L values.

Higher hope all of those values are very small. If they’re, you’re advantageous. In the event that they aren’t, you’ll discover in manufacturing on a weekend.

The MULTISET case

Whereas I maintain advocating MULTISET because the holy grail as a result of it’s so highly effective, handy, kind secure, and fairly performant, as we’ll see subsequent, it isn’t the holy grail like every part else we ever hoped for promising holy-graily-ness.

Whereas it could be theoretically potential to implement some hash be a part of fashion nested assortment algorithm within the MULTISET case, I believe that the emulations, which at the moment use XMLAGG, JSON_ARRAYAGG or comparable constructs, gained’t be optimised this fashion, and as such, we’ll get correlated subqueries, which is actually N+1, however 100% on the server aspect.

As increasingly more individuals use SQL/JSON options, these could be optimised additional sooner or later, although. I wouldn’t maintain my breath for RDBMS distributors investing time to enhance SQL/XML an excessive amount of (regrettably).

We are able to confirm the execution plan by working an EXPLAIN (on PostgreSQL) on the question generated by jOOQ for the doubly nested assortment case:

clarify choose
  actor.actor_id,
  row (actor.first_name, actor.last_name),
  (
    choose coalesce(
      json_agg(json_build_array(v0, v1, v2)),
      json_build_array()
    )
    from (
      choose
        film_actor.film_id as v0,
        alias_75379701.title as v1,
        (
          choose coalesce(
            json_agg(json_build_array(v0)),
            json_build_array()
          )
          from (
            choose alias_130639425.identify as v0
            from (
              film_category
                be a part of class as alias_130639425
                  on film_category.category_id = 
                    alias_130639425.category_id
              )
            the place film_category.film_id = film_actor.film_id
          ) as t
        ) as v2
      from (
        film_actor
          be a part of movie as alias_75379701
            on film_actor.film_id = alias_75379701.film_id
        )
      the place film_actor.actor_id = actor.actor_id
    ) as t
  )
from actor
the place actor.actor_id = 1

The result’s:

QUERY PLAN                                                                                                                   
-----------------------------------------------------------------------------------------------------------------------------
Seq Scan on actor  (value=0.00..335.91 rows=1 width=72)                                                                       
  Filter: (actor_id = 1)                                                                                                     
  SubPlan 2                                                                                                                  
    ->  Mixture  (value=331.40..331.41 rows=1 width=32)                                                                     
          ->  Hash Be a part of  (value=5.09..73.73 rows=27 width=23)                                                                 
                Hash Cond: (alias_75379701.film_id = film_actor.film_id)                                                     
                ->  Seq Scan on movie alias_75379701  (value=0.00..66.00 rows=1000 width=23)                                   
                ->  Hash  (value=4.75..4.75 rows=27 width=8)                                                                  
                      ->  Index Solely Scan utilizing film_actor_pkey on film_actor  (value=0.28..4.75 rows=27 width=8)             
                            Index Cond: (actor_id = actor.actor_id)                                                          
          SubPlan 1                                                                                                          
            ->  Mixture  (value=9.53..9.54 rows=1 width=32)                                                                 
                  ->  Hash Be a part of  (value=8.30..9.52 rows=1 width=7)                                                            
                        Hash Cond: (alias_130639425.category_id = film_category.category_id)                                 
                        ->  Seq Scan on class alias_130639425  (value=0.00..1.16 rows=16 width=15)                         
                        ->  Hash  (value=8.29..8.29 rows=1 width=8)                                                           
                              ->  Index Solely Scan utilizing film_category_pkey on film_category  (value=0.28..8.29 rows=1 width=8)
                                    Index Cond: (film_id = film_actor.film_id)                                               

As anticipated, two nested scalar subqueries. Don’t get side-tracked by the hash joins inside the subqueries. These are anticipated, as a result of we’re becoming a member of between e.g. FILM and FILM_ACTOR, or between CATEGORY and FILM_CATEGORY within the subquery. However this doesn’t have an effect on how the 2 subqueries are correlated to the outer-most question, the place we can’t use any hash joins.

So, we’ve an N+1 scenario, simply with out the latency of working a server roundtrip each time! The algorithmic complexity is similar, however the fixed overhead per merchandise has been eliminated, permitting for larger N earlier than it hurts – nonetheless the method will fail finally, similar to having too many JOIN on massive knowledge units is inefficient in RDBMS that don’t help hash be a part of or merge be a part of, however solely nested loop be a part of (e.g. older MySQL variations).

Future variations of jOOQ might help MULTISET extra natively on Oracle and PostgreSQL. It’s already supported natively in Informix, which has normal SQL MULTISET help. In PostgreSQL, issues may very well be achieved utilizing ARRAY(<subquery>) and ARRAY_AGG(), which could be extra clear to the optimiser than JSON_AGG. Whether it is, I’ll undoubtedly comply with up with one other weblog submit.

The only JOIN question case

I’d anticipate this method to work OK-ish, if the nested collections aren’t too huge (i.e. there isn’t an excessive amount of duplicate knowledge). As soon as the nested collections develop larger, the deduplication will bear fairly some prices as:

  • Extra redundant knowledge must be produced on the server aspect (requiring extra reminiscence and CPU)
  • Extra redundant knowledge must be transferred over the wire
  • Extra deduplication must be achieved on the shopper aspect (requiring extra reminiscence and CPU)

All in all, this method appears foolish for advanced nesting, however doable for a single nested assortment. This benchmark doesn’t check large deduplications.

The 1-query-per-nest-level case

I’d anticipate the very unwieldy 1-query-per-nest-level case to be probably the most performant as N scales. It’s additionally comparatively simple for an ORM to implement, in case the ORM is in full management of the generated SQL and doesn’t must respect any consumer question necessities. It gained’t work nicely if blended into consumer question syntax, and it’s laborious to do for customers manually each time.

Nonetheless, it’s an “after-the-fact” assortment nesting method, which means that it solely works nicely if some assumptions in regards to the authentic question will be maintained. E.g. JOIN FETCH in JPQL solely takes you this far. It could have been an OK workaround to nesting collections and making the idea obtainable for easy circumstances, however I’m constructive JPA / JPQL will evolve and in addition undertake MULTISET primarily based approaches. In spite of everything, MULTISET has been a SQL normal for ORDBMS for ages now.

The long-term answer for nesting collections can solely be to nest them straight in SQL, and make all of the logic obtainable to the optimiser for its varied selections.

Benchmark outcomes

Lastly, some outcomes! I’ve run the benchmark on these 4 RDBMS:

  • MySQL
  • Oracle
  • PostgreSQL
  • SQL Server

I didn’t run it on Db2, which might’t correlate derived tables but, an important function for correlated MULTISET subqueries in jOOQ in 3.15 – 3.17’s MULTISET emulation (see https://github.com/jOOQ/jOOQ/points/12045 for particulars).

As all the time, since benchmark outcomes can’t be revealed for business RDBMS, I’m not publishing precise occasions, solely relative occasions, the place the slowest execution is 1, and sooner executions are multiples of 1. Think about some precise unit of measurement, like operations/second, solely it’s not per second however per undisclosed unit of time. That manner, the RDBMS can solely be in contrast with themselves, not with one another.

MySQL:

Benchmark                                                        (filter)   Mode  Cnt     Rating     Error   Items
MultisetVsJoinBenchmark.doubleNestingJoin                            true  thrpt    7   4413.48 ±  448.63   ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetJSON                    true  thrpt    7   2524.96 ±  402.38   ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetJSONB                   true  thrpt    7   2738.62 ±  332.37   ops/time-unit
MultisetVsJoinBenchmark.doubleNestingNPlusOneQueries                 true  thrpt    7    265.37 ±   42.98   ops/time-unit
MultisetVsJoinBenchmark.doubleNestingTwoQueries                      true  thrpt    7   2256.38 ±  363.18   ops/time-unit
                                                                                                            
MultisetVsJoinBenchmark.doubleNestingJoin                           false  thrpt    7    266.27 ±   13.31   ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetJSON                   false  thrpt    7     54.98 ±    2.25   ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetJSONB                  false  thrpt    7     54.05 ±    1.58   ops/time-unit
MultisetVsJoinBenchmark.doubleNestingNPlusOneQueries                false  thrpt    7      1.00 ±    0.11   ops/time-unit
MultisetVsJoinBenchmark.doubleNestingTwoQueries                     false  thrpt    7    306.23 ±   11.64   ops/time-unit
                                                                                                            
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSON         true  thrpt    7   3058.68 ±  722.24   ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSONB        true  thrpt    7   3179.18 ±  333.77   ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsNPlusOneQueries      true  thrpt    7   1845.75 ±  167.26   ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsTwoQueries           true  thrpt    7   2425.76 ±  579.73   ops/time-unit
                                                                                                            
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSON        false  thrpt    7     91.78 ±    2.65   ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSONB       false  thrpt    7     92.48 ±    2.25   ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsNPlusOneQueries     false  thrpt    7      2.84 ±    0.48   ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsTwoQueries          false  thrpt    7    171.66 ±    19.8   ops/time-unit

Oracle:

Benchmark                                                        (filter)   Mode  Cnt     Rating     Error   Items
MultisetVsJoinBenchmark.doubleNestingJoin                            true  thrpt    7    669.54 ±   28.35   ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetJSON                    true  thrpt    7    419.13 ±   23.60   ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetJSONB                   true  thrpt    7    432.40 ±   17.76   ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetXML                     true  thrpt    7    351.42 ±   18.70   ops/time-unit
MultisetVsJoinBenchmark.doubleNestingNPlusOneQueries                 true  thrpt    7    251.73 ±   30.19   ops/time-unit
MultisetVsJoinBenchmark.doubleNestingTwoQueries                      true  thrpt    7    548.80 ±  117.40   ops/time-unit
                                                                                                            
MultisetVsJoinBenchmark.doubleNestingJoin                           false  thrpt    7     15.59 ±    1.86   ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetJSON                   false  thrpt    7      2.41 ±    0.07   ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetJSONB                  false  thrpt    7      2.40 ±    0.07   ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetXML                    false  thrpt    7      1.91 ±    0.06   ops/time-unit
MultisetVsJoinBenchmark.doubleNestingNPlusOneQueries                false  thrpt    7      1.00 ±    0.12   ops/time-unit
MultisetVsJoinBenchmark.doubleNestingTwoQueries                     false  thrpt    7     13.63 ±    1.57   ops/time-unit
                                                                                                            
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSON         true  thrpt    7   1217.79 ±   89.87   ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSONB        true  thrpt    7   1214.07 ±   76.10   ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetXML          true  thrpt    7    702.11 ±   87.37   ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsNPlusOneQueries      true  thrpt    7    919.47 ±  340.63   ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsTwoQueries           true  thrpt    7   1194.05 ±  179.92   ops/time-unit
                                                                                                            
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSON        false  thrpt    7      2.89 ±    0.08   ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSONB       false  thrpt    7      3.00 ±    0.05   ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetXML         false  thrpt    7      1.04 ±    0.17   ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsNPlusOneQueries     false  thrpt    7      1.52 ±    0.08   ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsTwoQueries          false  thrpt    7     13.00 ±    1.96   ops/time-unit

PostgreSQL:

Benchmark                                                        (filter)   Mode  Cnt     Rating     Error   Items
MultisetVsJoinBenchmark.doubleNestingJoin                            true   thrpt   7     4128.21 ± 398.82  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetJSON                    true   thrpt   7     3187.88 ± 409.30  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetJSONB                   true   thrpt   7     3064.69 ± 154.75  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetXML                     true   thrpt   7     1973.44 ± 166.22  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingNPlusOneQueries                 true   thrpt   7      267.15 ±  34.01  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingTwoQueries                      true   thrpt   7     2081.03 ± 317.95  ops/time-unit
                                                                                                            
MultisetVsJoinBenchmark.doubleNestingJoin                           false   thrpt   7      275.95 ±   6.80  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetJSON                   false   thrpt   7       53.94 ±   1.06  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetJSONB                  false   thrpt   7       45.00 ±   0.52  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetXML                    false   thrpt   7       25.11 ±   1.01  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingNPlusOneQueries                false   thrpt   7        1.00 ±   0.07  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingTwoQueries                     false   thrpt   7      306.11 ±  35.40  ops/time-unit
                                                                                                            
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSON         true   thrpt   7     4710.47 ± 194.37  ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSONB        true   thrpt   7     4391.78 ± 223.62  ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetXML          true   thrpt   7     2740.73 ± 186.70  ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsNPlusOneQueries      true   thrpt   7     1792.94 ± 134.50  ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsTwoQueries           true   thrpt   7     2821.82 ± 252.34  ops/time-unit
                                                                                                            
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSON        false   thrpt   7       68.45 ±   2.58  ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSONB       false   thrpt   7       58.59 ±   0.58  ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetXML         false   thrpt   7       15.58 ±   0.35  ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsNPlusOneQueries     false   thrpt   7        2.71 ±   0.16  ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsTwoQueries          false   thrpt   7      163.03 ±   7.54  ops/time-unit

SQL Server:

Benchmark                                                        (filter)   Mode  Cnt    Rating     Error  Items
MultisetVsJoinBenchmark.doubleNestingJoin                            true  thrpt    7  4081.85 ± 1029.84  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetJSON                    true  thrpt    7  1243.17 ±   84.24  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetJSONB                   true  thrpt    7  1254.13 ±   56.94  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetXML                     true  thrpt    7  1077.23 ±   61.50  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingNPlusOneQueries                 true  thrpt    7   264.45 ±   16.12  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingTwoQueries                      true  thrpt    7  1608.92 ±  145.75  ops/time-unit
                                                                                             
MultisetVsJoinBenchmark.doubleNestingJoin                           false  thrpt    7   359.08 ±   20.88  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetJSON                   false  thrpt    7     8.41 ±    0.06  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetJSONB                  false  thrpt    7     8.32 ±    0.15  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingMultisetXML                    false  thrpt    7     7.24 ±    0.08  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingNPlusOneQueries                false  thrpt    7     1.00 ±    0.09  ops/time-unit
MultisetVsJoinBenchmark.doubleNestingTwoQueries                     false  thrpt    7   376.02 ±    7.60  ops/time-unit
                                                                                                  
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSON         true  thrpt    7  1735.23 ±  178.30  ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSONB        true  thrpt    7  1736.01 ±   92.26  ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetXML          true  thrpt    7  1339.68 ±  137.47  ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsNPlusOneQueries      true  thrpt    7  1264.50 ±  343.56  ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsTwoQueries           true  thrpt    7  1057.54 ±  130.13  ops/time-unit
                                                                                                  
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSON        false  thrpt    7     7.90 ±    0.05  ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSONB       false  thrpt    7     7.85 ±    0.15  ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsMultisetXML         false  thrpt    7     5.06 ±    0.18  ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsNPlusOneQueries     false  thrpt    7     1.77 ±    0.28  ops/time-unit
MultisetVsJoinBenchmark.multipleChildCollectionsTwoQueries          false  thrpt    7   255.08 ±   19.22  ops/time-unit

Benchmark conclusion

As will be seen in most RDBMS:

  • All RDBMS produced comparable outcomes.
  • The added latency of N+1 nearly all the time contributed a major efficiency penalty. An exception is the place we’ve a filter = true and a number of little one collections, in case of which the dad or mum N is 1 (duh) and solely a single nest stage is carried out.
  • MULTISET carried out even higher than the one question JOIN primarily based method or the 1-query-per-nest-level method when filter = true and with a number of little one collections, most likely due to the extra compact knowledge format.
  • The XML primarily based MULTISET emulation is all the time the slowest among the many emulations, seemingly as a result of it requires extra formatting. (In a single Oracle case, the XML primarily based MULTISET emulation was even slower than the extraordinary N+1 method).
  • JSONB is a bit slower in PostgreSQL than JSON, seemingly as a result of JSON is a purely textual content primarily based format, with none submit processing / cleanup. JSONB‘s benefit is just not with projection-only queries, however with storage, comparability, and different operations. For many usages, JSONB might be higher. For projection solely, JSON is best (jOOQ 3.17 will make this the default for the MULTISET emulation)
  • It’s price noting that jOOQ serialises data as JSON arrays, not JSON objects, with a view to keep away from transferring repetitive column names, and provide positional index when deserialising the array.
  • For big-ish knowledge units (the place filter = false), the N+1 issue of a MULTISET correlated subquery can turn into an issue (because of the nature of algorithmic complexity), because it prevents utilizing extra environment friendly hash joins. In these circumstances, the one question JOIN primarily based method or 1-query-per-nest-level method are higher

In brief:

  • MULTISET can be utilized at any time when a nested loop be a part of is perfect.
  • If a hash be a part of or merge be a part of could be extra optimum, then the one question JOIN method or the 1-query-per-nest-level method are likely to carry out higher (although they’ve their very own caveats as complexity grows)

The profit in comfort and correctness is certainly price it for small knowledge units. For bigger knowledge units, proceed utilizing JOIN. As all the time, there’s no silver bullet.

Issues this weblog submit didn’t examine

A couple of issues weren’t investigated by this weblog submit, together with:

  • The serialisation overhead within the server. Strange JDBC ResultSet are likely to revenue from a binary community protocol between server and shopper. With JSON or XML, that advantage of protocol compactness goes away, and a scientific overhead is produced. To what extent this performs a job has not been investigated.
  • The identical is true on the shopper aspect, the place nested JSON or XML paperwork must be deserialised. Whereas the next VisualVM screenshot reveals that there’s some overhead, it’s not important in comparison with the execution time. Additionally, it isn’t a major quantity extra overhead than what jOOQ already produces when mapping between ResultSet and jOOQ knowledge constructions. I imply, clearly, utilizing JDBC straight will be sooner in the event you’re doing it proper, however then you definately take away all of the comfort jOOQ creates.
Evaluating time it takes to deserialise outcome (4.7%) vs execute question (92%)

Benchmark code

Lastly, do you have to want to reproduce this benchmark, or adapt it to your individual wants, right here’s the code.

I’ve used JMH for the benchmark. Whereas that is clearly not a “micro benchmark,” I like JMH’s method to benchmarking, together with:

  • Straightforward configuration
  • Warmup penalty is eliminated by doing warmup iterations
  • Statistics are collected to deal with outlier results

Clearly, all of the model use jOOQ for question constructing, execution, mapping, to attain truthful and significant outcomes. It might be potential to make use of JDBC straight within the non-MULTISET approaches, however that wouldn’t be a good comparability of ideas.

The benchmark assumes availability of a SAKILA database occasion, in addition to generated code, much like this jOOQ demo.

bundle org.jooq.check.benchmarks.native;

import static java.util.stream.Collectors.groupingBy;
import static org.jooq.Information.intoGroups;
import static org.jooq.Information.mapping;
import static org.jooq.instance.db.postgres.Tables.*;
import static org.jooq.impl.DSL.multiset;
import static org.jooq.impl.DSL.noCondition;
import static org.jooq.impl.DSL.row;
import static org.jooq.impl.DSL.choose;

import java.io.InputStream;
import java.sql.Connection;
import java.sql.DriverManager;
import java.util.ArrayList;
import java.util.Listing;
import java.util.Map;
import java.util.Properties;
import java.util.operate.Client;

import org.jooq.DSLContext;
import org.jooq.Record5;
import org.jooq.Outcome;
import org.jooq.conf.NestedCollectionEmulation;
import org.jooq.conf.Settings;
import org.jooq.impl.DSL;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Degree;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.TearDown;
import org.openjdk.jmh.annotations.Warmup;

@Fork(worth = 1)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 7, time = 3)
public class MultisetVsJoinBenchmark {

    @State(Scope.Benchmark)
    public static class BenchmarkState {

        Connection connection;
        DSLContext ctx;

        @Param({ "true", "false" })
        boolean    filter;

        @Setup(Degree.Trial)
        public void setup() throws Exception {
            strive (InputStream is = BenchmarkState.class.getResourceAsStream("/config.mysql.properties")) {
                Properties p = new Properties();
                p.load(is);
                Class.forName(p.getProperty("db.mysql.driver"));
                connection = DriverManager.getConnection(
                    p.getProperty("db.mysql.url"),
                    p.getProperty("db.mysql.username"),
                    p.getProperty("db.mysql.password")
                );
            }

            ctx = DSL.utilizing(connection, new Settings()
                .withExecuteLogging(false)
                .withRenderSchema(false));
        }

        @TearDown(Degree.Trial)
        public void teardown() throws Exception {
            connection.shut();
        }
    }

    file DNName(String firstName, String lastName) {}
    file DNCategory(String identify) {}
    file DNFilm(lengthy id, String title, Listing<DNCategory> classes) {}
    file DNActor(lengthy id, DNName identify, Listing<DNFilm> movies) {}

    @Benchmark
    public Listing<DNActor> doubleNestingMultisetXML(BenchmarkState state) {
        state.ctx.settings().setEmulateMultiset(NestedCollectionEmulation.XML);
        return doubleNestingMultiset0(state);
    }

    @Benchmark
    public Listing<DNActor> doubleNestingMultisetJSON(BenchmarkState state) {
        state.ctx.settings().setEmulateMultiset(NestedCollectionEmulation.JSON);
        return doubleNestingMultiset0(state);
    }

    @Benchmark
    public Listing<DNActor> doubleNestingMultisetJSONB(BenchmarkState state) {
        state.ctx.settings().setEmulateMultiset(NestedCollectionEmulation.JSONB);
        return doubleNestingMultiset0(state);
    }

    non-public Listing<DNActor> doubleNestingMultiset0(BenchmarkState state) {
        return state.ctx
            .choose(
                ACTOR.ACTOR_ID,
                row(
                    ACTOR.FIRST_NAME,
                    ACTOR.LAST_NAME
                ).mapping(DNName::new),
                multiset(
                    choose(
                        FILM_ACTOR.FILM_ID,
                        FILM_ACTOR.movie().TITLE,
                        multiset(
                            choose(FILM_CATEGORY.class().NAME)
                            .from(FILM_CATEGORY)
                            .the place(FILM_CATEGORY.FILM_ID.eq(FILM_ACTOR.FILM_ID))
                        ).convertFrom(r -> r.map(mapping(DNCategory::new)))
                    )
                    .from(FILM_ACTOR)
                    .the place(FILM_ACTOR.ACTOR_ID.eq(ACTOR.ACTOR_ID))
                ).convertFrom(r -> r.map(mapping(DNFilm::new))))
            .from(ACTOR)
            .the place(state.filter ? ACTOR.ACTOR_ID.eq(1L) : noCondition())
            .fetch(mapping(DNActor::new));
    }

    @Benchmark
    public Listing<DNActor> doubleNestingJoin(BenchmarkState state) {
        return state.ctx
            .choose(
                FILM_ACTOR.ACTOR_ID,
                FILM_ACTOR.actor().FIRST_NAME,
                FILM_ACTOR.actor().LAST_NAME,
                FILM_ACTOR.FILM_ID,
                FILM_ACTOR.movie().TITLE,
                FILM_CATEGORY.class().NAME)
            .from(FILM_ACTOR)
            .be a part of(FILM_CATEGORY).on(FILM_ACTOR.FILM_ID.eq(FILM_CATEGORY.FILM_ID))
            .the place(state.filter ? FILM_ACTOR.ACTOR_ID.eq(1L) : noCondition())
            .gather(groupingBy(
                r -> new DNActor(r.value1(), new DNName(r.value2(), r.value3()), null),
                groupingBy(r -> new DNFilm(r.value4(), r.value5(), null))
            ))
            .entrySet()
            .stream()
            .map(a -> new DNActor(
                a.getKey().id(),
                a.getKey().identify(),
                a.getValue()
                 .entrySet()
                 .stream()
                 .map(f -> new DNFilm(
                     f.getKey().id(),
                     f.getKey().title(),
                     f.getValue().stream().map(c -> new DNCategory(c.value6())).toList()
                 ))
                 .toList()
            ))
            .toList();
    }

    @Benchmark
    public Listing<DNActor> doubleNestingTwoQueries(BenchmarkState state) {
        Outcome<Record5<Lengthy, String, String, Lengthy, String>> actorAndFilms = state.ctx
            .choose(
                FILM_ACTOR.ACTOR_ID,
                FILM_ACTOR.actor().FIRST_NAME,
                FILM_ACTOR.actor().LAST_NAME,
                FILM_ACTOR.FILM_ID,
                FILM_ACTOR.movie().TITLE)
            .from(FILM_ACTOR)
            .the place(state.filter ? FILM_ACTOR.ACTOR_ID.eq(1L) : noCondition())
            .fetch();

        Map<Lengthy, Listing<DNCategory>> categoriesPerFilm = state.ctx
            .choose(
                FILM_CATEGORY.FILM_ID,
                FILM_CATEGORY.class().NAME)
            .from(FILM_CATEGORY)
            .the place(state.filter
                ? FILM_CATEGORY.FILM_ID.in(actorAndFilms.map(r -> r.value4()))
                : noCondition())
            .gather(intoGroups(
                r -> r.value1(),
                r -> new DNCategory(r.value2())
            ));

        return actorAndFilms
            .gather(groupingBy(
                r -> new DNActor(r.value1(), new DNName(r.value2(), r.value3()), null),
                groupingBy(r -> new DNFilm(r.value4(), r.value5(), null))
            ))
            .entrySet()
            .stream()
            .map(a -> new DNActor(
                a.getKey().id(),
                a.getKey().identify(),
                a.getValue()
                 .entrySet()
                 .stream()
                 .map(f -> new DNFilm(
                     f.getKey().id(),
                     f.getKey().title(),
                     categoriesPerFilm.get(f.getKey().id())
                 ))
                 .toList()
            ))
            .toList();
    }

    @Benchmark
    public Listing<DNActor> doubleNestingNPlusOneQueries(BenchmarkState state) {
        return state.ctx
            .choose(ACTOR.ACTOR_ID, ACTOR.FIRST_NAME, ACTOR.LAST_NAME)
            .from(ACTOR)
            .the place(state.filter ? ACTOR.ACTOR_ID.eq(1L) : noCondition())
            .fetch(a -> new DNActor(
                a.value1(),
                new DNName(a.value2(), a.value3()),
                state.ctx
                    .choose(FILM_ACTOR.FILM_ID, FILM_ACTOR.movie().TITLE)
                    .from(FILM_ACTOR)
                    .the place(FILM_ACTOR.ACTOR_ID.eq(a.value1()))
                    .fetch(f -> new DNFilm(
                        f.value1(),
                        f.value2(),
                        state.ctx
                            .choose(FILM_CATEGORY.class().NAME)
                            .from(FILM_CATEGORY)
                            .the place(FILM_CATEGORY.FILM_ID.eq(f.value1()))
                            .fetch(r -> new DNCategory(r.value1()))
                    ))
            ));
    }

    file MCCName(String firstName, String lastName) {}
    file MCCCategory(String identify) {}
    file MCCActor(lengthy id, MCCName identify) {}
    file MCCFilm(lengthy id, String title, Listing<MCCActor> actors, Listing<MCCCategory> classes) {}

    @Benchmark
    public Listing<MCCFilm> multipleChildCollectionsMultisetXML(BenchmarkState state) {
        state.ctx.settings().setEmulateMultiset(NestedCollectionEmulation.XML);
        return multipleChildCollectionsMultiset0(state);
    }

    @Benchmark
    public Listing<MCCFilm> multipleChildCollectionsMultisetJSON(BenchmarkState state) {
        state.ctx.settings().setEmulateMultiset(NestedCollectionEmulation.JSON);
        return multipleChildCollectionsMultiset0(state);
    }

    @Benchmark
    public Listing<MCCFilm> multipleChildCollectionsMultisetJSONB(BenchmarkState state) {
        state.ctx.settings().setEmulateMultiset(NestedCollectionEmulation.JSONB);
        return multipleChildCollectionsMultiset0(state);
    }

    non-public Listing<MCCFilm> multipleChildCollectionsMultiset0(BenchmarkState state) {
        return state.ctx
            .choose(
                FILM.FILM_ID,
                FILM.TITLE,
                multiset(
                    choose(
                        FILM_ACTOR.ACTOR_ID,
                        row(
                            FILM_ACTOR.actor().FIRST_NAME,
                            FILM_ACTOR.actor().LAST_NAME
                        ).mapping(MCCName::new)
                    )
                    .from(FILM_ACTOR)
                    .the place(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
                ).convertFrom(r -> r.map(mapping(MCCActor::new))),
                multiset(
                    choose(
                        FILM_CATEGORY.class().NAME
                    )
                    .from(FILM_CATEGORY)
                    .the place(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
                ).convertFrom(r -> r.map(mapping(MCCCategory::new))))
            .from(FILM)
            .the place(state.filter ? FILM.FILM_ID.eq(1L) : noCondition())
            .fetch(mapping(MCCFilm::new));
    }

    @Benchmark
    public Listing<MCCFilm> multipleChildCollectionsTwoQueries(BenchmarkState state) {
        Outcome<Record5<Lengthy, String, Lengthy, String, String>> filmsAndActors = state.ctx
            .choose(
                FILM_ACTOR.FILM_ID,
                FILM_ACTOR.movie().TITLE,
                FILM_ACTOR.ACTOR_ID,
                FILM_ACTOR.actor().FIRST_NAME,
                FILM_ACTOR.actor().LAST_NAME)
            .from(FILM_ACTOR)
            .the place(state.filter ? FILM_ACTOR.FILM_ID.eq(1L) : noCondition())
            .fetch();

        Map<Lengthy, Listing<MCCCategory>> categoriesPerFilm = state.ctx
            .choose(
                FILM_CATEGORY.FILM_ID,
                FILM_CATEGORY.class().NAME)
            .from(FILM_CATEGORY)
            .the place(FILM_CATEGORY.FILM_ID.in(
                filmsAndActors.map(r -> r.value1())
            ))
            .and(state.filter ? FILM_CATEGORY.FILM_ID.eq(1L) : noCondition())
            .gather(intoGroups(
                r -> r.value1(),
                r -> new MCCCategory(r.value2())
            ));

        return filmsAndActors
            .gather(groupingBy(
                r -> new MCCFilm(r.value1(), r.value2(), null, null),
                groupingBy(r -> new MCCActor(r.value3(), new MCCName(r.value4(), r.value5())))
            ))
            .entrySet()
            .stream()
            .map(f -> new MCCFilm(
                f.getKey().id(),
                f.getKey().title(),
                new ArrayList<>(f.getValue().keySet()),
                categoriesPerFilm.get(f.getKey().id())
            ))
            .toList();
    }

    @Benchmark
    public Listing<MCCFilm> multipleChildCollectionsNPlusOneQueries(BenchmarkState state) {
        return state.ctx
            .choose(FILM.FILM_ID, FILM.TITLE)
            .from(FILM)
            .the place(state.filter ? FILM.FILM_ID.eq(1L) : noCondition())
            .fetch(f -> new MCCFilm(
                f.value1(),
                f.value2(),
                state.ctx
                    .choose(
                        FILM_ACTOR.ACTOR_ID,
                        FILM_ACTOR.actor().FIRST_NAME,
                        FILM_ACTOR.actor().LAST_NAME)
                    .from(FILM_ACTOR)
                    .the place(FILM_ACTOR.FILM_ID.eq(f.value1()))
                    .fetch(a -> new MCCActor(
                        a.value1(),
                        new MCCName(a.value2(), a.value3())
                    )),
                state.ctx
                    .choose(FILM_CATEGORY.class().NAME)
                    .from(FILM_CATEGORY)
                    .the place(FILM_CATEGORY.FILM_ID.eq(f.value1()))
                    .fetch(c -> new MCCCategory(c.value1()))
            ));
    }
}

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles