06. Router & Query Plan — fan-out · merge

Federation의 런타임 동작은 router가 query plan을 만들어 실행하는 게 전부다. Query plan은 상당히 복잡한 최적화다 — 컴파일러의 query optimizer와 비슷하다.

이 문서는 router의 내부 동작과 plan을 어떻게 읽고 디버깅하는지 다룬다.

한 줄 답

Router는 supergraph SDL을 기준으로 클라이언트 쿼리를 분해한다 — 어느 필드를 어느 subgraph에서 가져올지, 어떤 순서로(병렬·순차), 어떻게 fetch dependencies를 푸는지. 이 plan은 DAG (방향성 비순환 그래프) 모양이고, 좋은 plan은 fan-out을 최소화하면서 parallelism을 최대화한다. 비용: 평균 latency 1.2~2배, N개 subgraph의 fanout.

Why — Query Plan은 왜 필요한가

# 클라이언트 쿼리
query ProductPage {
  productById(id: "p1") {       # Product subgraph
    name                         # Product subgraph
    price                        # Pricing subgraph (override됨)
    reviews(first: 3) {          # Review subgraph
      rating
      author {                   # User subgraph
        name
      }
    }
    inStock                      # Inventory subgraph
  }
}

5개 subgraph가 관여한다. router는 어떻게 분배할까?

나쁜 plan: 5개를 전부 sequential로 → latency 5x. 더 나쁜 plan: review의 author마다 N번 User subgraph 호출 → N+1. 좋은 plan: 의존성이 없는 것은 병렬, 있는 것만 순차, batch 가능한 건 batch.

이게 query planner의 역할이다.

How — Query Plan의 구성요소

4가지 노드 타입

Apollo Router의 plan은 DAG인데, 노드는 4가지가 있다.

Fetch — 한 subgraph로 부분 쿼리를 보냄
Parallel — 자식 노드들을 동시 실행
Sequence — 자식 노드들을 순차 실행 (의존성)
Flatten — 응답에서 특정 path를 평탄화 (response merge 직전)

예시 plan (위 ProductPage 쿼리)

QueryPlan {
  Sequence {
    // Step 1: Product subgraph
    Fetch(service: "product") {
      { productById(id:"p1") {
          __typename id name
      } }
    },
    // Step 2: 여러 subgraph 병렬
    Parallel {
      Flatten(path: "productById") {
        Fetch(service: "pricing") {
          { _entities(representations:[$repr]) {
              ... on Product { price }
          } }
        }
      },
      Flatten(path: "productById") {
        Fetch(service: "inventory") {
          { _entities(representations:[$repr]) {
              ... on Product { inStock }
          } }
        }
      },
      Flatten(path: "productById") {
        Sequence {
          // Step 3a: Review subgraph
          Fetch(service: "review") {
            { _entities(representations:[$repr]) {
                ... on Product { reviews(first:3) {
                  __typename rating author { __typename id }
                } }
            } }
          },
          // Step 3b: Author들을 위해 User subgraph (batch)
          Flatten(path: "productById.reviews.@.author") {
            Fetch(service: "user") {
              { _entities(representations:[$reprs]) {
                  ... on User { name }
              } }
            }
          }
        }
      }
    }
  }
}

→ Step 1 끝나야 Step 2 시작 (의존). Step 2 안에서 Pricing, Inventory, Review는 병렬. Review 안에서 Author는 batch (N+1 회피).

비용 분석

Step 1: 1 fetch → P_product latency
Step 2: 3 fetches 병렬 → max(P_pricing, P_inventory, P_review + P_user_batched)
Total ≈ P_product + max(...)   # 2 round trips

→ 모놀리식이면 1 round trip. federation은 최소 2, 의존성 깊으면 3+.

How — Optimal Plan 선택

같은 쿼리를 여러 plan으로 풀 수 있을 때 — 어떤 게 optimal인가?

Apollo Router의 planner는 비용 함수를 가지고 후보 plan을 비교한다. 비용 요소:

Fetch 개수 — 적을수록 좋음
Sequential depth — 얕을수록 좋음 (병렬화가 더 많이 가능)
각 fetch의 payload 크기 — 작을수록 좋음
선호 subgraph — @shareable 필드는 이미 도는 subgraph에 합치는 게 유리

@shareable과 plan 최적화

# A subgraph
type Product @key(fields: "id") {
  id: ID!
  inStock: Boolean! @shareable
}
 
# B subgraph
type Product @key(fields: "id") {
  id: ID!
  inStock: Boolean! @shareable
  description: String!
}

쿼리:

{ productById(id:"p1") { description inStock } }

→ router는 B 한 번에 둘 다 가져옴 (A로 가면 fetch 2번 됨). @shareable이 fetch 수 줄이는 도구이기도 함.

How — Plan 시각화 도구

1. Apollo Studio Explorer

GraphOS의 Explorer는 쿼리를 실행하기 전에 plan 미리보기 가능. visual하게 DAG를 보여줌.

2. `rover dev` 로컬

rover dev --supergraph-config ./supergraph.yaml
# → 로컬 router가 뜨고, http://localhost:4000/?explain=true 로 plan 확인

3. Router의 `--apollo-plan-output` (디버그)

# router.yaml
plugins:
  apollo.include_subgraph_errors: true
  experimental.query_planner:
    debug:
      max_plans_considered: 100   # 후보 몇 개까지 보였는지

4. 응답에 plan 헤더 요청

POST /graphql
apollo-include-query-plan: true

→ 응답에 실제 사용된 plan이 포함됨. production 디버깅에서 어떤 fetch가 느렸나 추적 가능.

How — Apollo Router의 Rust 구현

왜 Rust인가

초기 Apollo Gateway(Node.js)의 한계:

항목	Node.js Gateway	Rust Router
동시성	event loop (single thread)	Tokio (multi-thread)
GC pause	발생	없음
Memory	높음 (V8 overhead)	낮음
p99 latency	100ms+	10ms 미만
처리량	1k RPS/instance	10k+ RPS/instance

→ 게이트웨이는 모든 트래픽이 통과하는 단일 지점이라 성능 = 전사 비용. Rust 재작성은 지속가능성의 결정.

Tokio 기반 비동기

// 매우 단순화된 router 모식
async fn execute_plan(plan: QueryPlan, services: &Services) -> Response {
    match plan {
        QueryPlan::Parallel(steps) => {
            // tokio::join! — 모든 step 동시 실행
            let results = futures::future::join_all(
                steps.iter().map(|s| execute_plan(s, services))
            ).await;
            merge_results(results)
        },
        QueryPlan::Sequence(steps) => {
            // 순차 실행
            let mut result = Response::default();
            for step in steps {
                let r = execute_plan(step, services).await;
                result = merge_results(vec![result, r]);
            }
            result
        },
        QueryPlan::Fetch { service, query } => {
            services.get(service).execute(query).await
        },
        // ...
    }
}

→ 수만 개의 동시 쿼리가 단일 프로세스에서 효율적으로 실행 가능. Tokio가 worker thread pool을 알아서 관리.

What — 런타임 비용의 현실

Latency overhead

시나리오	overhead
모든 필드가 한 subgraph에 있음	~10ms (router 자체)
2 subgraph 의존 (sequential)	+1 hop = +20~50ms
N subgraph 의존	+N-1 hop (sequential 최악)
N+1 발생	+N개 fetch (= 재앙)

Fan-out 비용

router는 N개 subgraph에 동시 요청. N이 늘면 cluster 전체 부하도 늘어남. 한 클라이언트 요청이 5 subgraph 호출이면 — 5 RPS가 internal로 변환됨.

메모리 비용

router는 모든 부분 응답을 메모리에 들고 merge. 큰 list (예: 1000개 review)면 메모리 spike. streaming은 federation에서 본질적으로 어렵다.

What-if — 흔한 함정과 해결

함정 1: Plan이 너무 깊은 sequence

A 호출 → B 호출 → C 호출 → D 호출 → ...   (5+ hop)

원인: @requires chain이 너무 깊음. e.g., A의 결과가 B에 필요, B의 결과가 C에 필요, …

해결: schema 재설계 — intermediate denormalization. e.g., A subgraph가 B의 일부 필드를 cache로 갖고 @provides로 함께 반환.

함정 2: N+1이 _entities에서 발생

Step 1: Product subgraph → [product1, product2, ..., product100]
Step 2: Review subgraph로 _entities(representations: [...]) 호출

→ Review subgraph가 각 representation마다 DB 쿼리하면 N+1.

해결: __resolveReference 또는 root resolver에서 DataLoader 적용. router는 이미 batch로 보내준다 — subgraph 쪽이 batch를 처리해야 함.

함정 3: Plan이 비결정적

같은 쿼리인데 plan이 매번 다름 → cache 비효율, 디버깅 지옥.

원인: @shareable 필드가 여러 subgraph에 있어서 router가 매번 다른 후보를 고름.

해결: experimental.query_planner.cache를 켜서 plan을 cache. 또는 @shareable을 최소화.

함정 4: 한 subgraph가 죽으면 전체 쿼리 실패

Pricing subgraph 다운 → productById 쿼리의 price만 실패 → 전체 쿼리 partial error

→ GraphQL은 partial response를 지원하니까 다른 필드는 살아남는다. 단, errors 배열을 클라이언트가 처리해야 함.

대응: subgraph별 fallback, circuit breaker (router의 traffic shaping plugin).

함정 5: 큰 list의 fan-out

query { products(first: 1000) { reviews { rating } } }

→ 1000 product × Review fetch = _entities에 1000개 representation. Review subgraph가 DB를 1000번 부르거나 한 번에 1000개 IN 절. 둘 다 문제.

해결: pagination 강제 (first 최대값 제한), 데이터 모델 재검토 (reviews 요약을 Product에 cache).

Insight — 흥미로운 이야기

“Apollo Router는 DB query optimizer를 모방했다”

Apollo의 router 설계 문서는 PostgreSQL planner와 Calcite를 인용한다. SQL의 join order optimization이 fetch order optimization과 문제 구조가 동일하다. 비용 함수 + 후보 plan 비교라는 패턴은 컴파일러/DB의 30년 노하우가 federation으로 들어온 셈.

“WunderGraph는 컴파일 타임 plan을 시도한다”

Apollo Router는 런타임에 plan을 생성한다. WunderGraph는 빌드 타임에 모든 쿼리의 plan을 미리 계산한다 (operations.graphql + persisted query 패턴). 런타임 비용 0. 단점: 동적 쿼리 불가 — 모든 쿼리가 사전 등록되어야 함.

→ 이건 persisted query(05 챕터)와 federation이 만나는 자리.

“Netflix의 Studio Edge는 router를 자기 구현했다”

Netflix는 Apollo Router 등장 전인 2019년 자체 federation gateway를 만들었다 (DGS). 이유: *그들의 trace 시스템(Mantis)*과 조직 정책에 통합해야 했다. 지금도 Apollo Federation spec은 따르되 구현은 직접. 이게 open spec의 효과 — Apollo가 spec을 풀어줘서 모두가 호환 구현 가능.

“Query plan은 클라이언트의 latency만큼 서버 비용도 결정한다”

좋은 plan = 적은 fetch = 적은 internal RPC. 한 외부 쿼리가 5 internal RPC면 5x cluster load. 결국 plan optimization은 클라이언트 UX와 서버 비용 둘 다 해결한다. 그래서 Apollo는 plan 최적화에 회사 명운을 걸고 있다.

요약 + Mermaid

Router는 supergraph SDL을 기준으로 쿼리를 DAG plan으로 분해한다. 좋은 plan은 fetch 최소화 + 병렬 최대화 + batch. 나쁜 plan은 깊은 sequence + N+1. Rust 구현 덕에 router 자체 overhead는 ~10ms 수준이지만, fan-out과 sequential dep는 수학적 비용이라 사라지지 않는다.

05. @shareable & @override — 소유권 이전 도구 07. 대안 — Stitching · Mesh · Hot Chocolate Fusion · WunderGraph