An easier method of polynomial composition: No more transposition theorem is needed!

· · 算法·理论

Just some random thoughts arised by a simple problem. But it seems that I'm reinventing wheels once more :(

If there's any mistake in the article, please don't hesitate to point it out in the comment.

Introduction

Yesterday night, my friend asked me a question on polynomials (his profile picture is pixelated in order to prevent privacy issues):

(Translation: Ciallo! Is there any solution for [x^n]f^k(x) for all k=1\sim n in \mathcal O(n ^ 2) time complexity that is easy to implement?)

And unfortunately I didn't figure out the solution at first. I even considered it as an unsolvable problem:

(Translation: It's actually unsolvable. I guess, isn't it?)

But I suddenly realized that in the latest algorithm for polynomial composition, we have done a similar process working on f^k(x).

So it might be solvable in \mathcal O(n\log ^ 2n)?

Why not give it a try?

Solution

We need to get coefficients for every possible k. Let's introduce another term y to control the number of f in a specific product, and rewrite the coefficient as [y^kx^n]1+yf(x)+y^2f^2(x)+\dots=[y^kx^n]\frac{1}{1-yf(x)}.

Do you remember how we handled the linear homogeneous recursions using polynomials? We transformed it into an evaluation of a certain coefficient, in the form of a fraction: [x^k]\frac{P(x)}{Q(x)}. We apply Bostan-Mori algorithm on this fraction afterwards.

For the bivariate polynomial with x, y, we could apply a similar technique. Let's say if we are now finding the coefficient [y^kx^n]\frac{P(x, y)}{Q(x, y)}. We multiply by Q(-x, y) on both numerator and denominator to eliminate odd powers of x on the denominator, resulting in [y^kx^n]\frac{P(x, y)Q(-x, y)}{Q'(x ^ 2, y)}, where Q'(x^2, y) = Q(x, y)Q(-x, y).

After this iteration, we can only conserve about half of the elements depending on the parity of n: if n is even, we rewrite it as \frac{P'(x^2, y)}{Q'(x^2, y)}, otherwise we rewrite it as \frac{xP'(x^2, y)}{Q'(x^2, y)}. Now the problem is reduced to a subproblem with same form with the degree of x halved. When n = 0, the variable x is entirely eliminated, and we only need to get [y^k]\frac{P(y)}{Q(y)} using polynomial inversion in \mathcal O(n\log n) time. (Can be done in \mathcal O(n), but not necessary.)

Consider the time complexity of the algorithm: after each iteration, although the degree of y is lifted to twice as before, the degree of x is halved. Thus, for every polynomial involved in the calculation, the product of degree for x, y is always \mathcal O(n), so we can multiply two polynomials within \mathcal O(n\log n) time.

We handled \mathcal O(\log n) multiplications, because after this number of iterations we will drop into the corner case of n = 0. Thus, the whole time complexity is \mathcal O(n\log ^ 2 n).

Polynomial Composition

Now let us move on to polynomial composition. Suppose we have two polynomials f, g, and a new polynomial g(f(x)) = g_0+g_1f^1(x)+g_2f^2(x)+\dots+g_nf^n(x) is required. (Assume that calculation is done under \bmod \ x^{n + 1}.) The expression seems to have many similarities with the previous question.

We need to assign a coefficient for each single f^k(x) depending on g(x). We can still maintain the variable y, since we know that the term with y^k should be multiplied by g_k.

Thus, we are actually multiplying the whole expression with g_0y^0+g_1y^{-1}+g_2y^{-2}+\dots+g_ny^{-n} to let every valid term be of y^0. With the help of this polynomial, the answer could be written as:

[y^0]\frac{g(y^{-1})}{1-yf(x)}\bmod \ x^{n + 1}

The following step is just the same as we have done in the previous part. Moreover, we need to store every coefficient from x^0 to x^n, which is slightly different. But an advantage is that the numerator only relates to y, instead of both x, y.

Suppose we are handling \frac{P(y)}{Q(x, y)}. Still multiplying by Q(-x, y), getting \frac{P(y)Q(-x, y)}{Q'(x ^ 2, y)}. But the problem is: if we still do casework on the parity of n, we will get two branches, leading to wrong time complexity as T(m)=2T(m/2)+\mathcal O(n\log n).

Can we avoid operations on x? Putting \frac{P(y)}{Q'(x^2, y)} into the recursion might be a good choice. Now we have the results for \frac{P(y)}{Q'(x^2, y)},and we need to multiply it by Q(-x, y).

For the first iteration, as we need coefficient of [y ^ 0], we just need to focus on [y^0] and [y ^ {-1}] of \frac{P(y)}{Q'(x^2, y)}, since \deg_y(Q) = 1. Now \deg_y(Q')=2, and we need [y^0], [y^{-1}],[y^{-2}], [y^{-3}] in the next iteration. So the number of needed values is depending on \deg_y(Q). More specifically, we just need to figure out [y^{-\deg_y(Q) + 1}]\sim [y^0] in order to correctly deal with the previous iterations.

As we have said in the previous part, \deg_x\deg_y = \mathcal O(n), and \sum \deg_y(Q)=\sum \limits_{k = 0}^{\log n}\mathcal O(2^k) = \mathcal O(n), so the time complexity is still \mathcal O(n\log n) per iteration, \mathcal O(n\log ^ 2 n) in total. When n = 0, an inverse is needed, which can also be done in \mathcal O(n \log n).

Thus, polynomial composition under modulo x^{n + 1} is solved in \mathcal O(n\log ^2 n) time complexity.

Code

:::success[P10249 (n = 2e5) runtime = 11s on luogu]

#include <bits/stdc++.h>
using ll = long long;
using ld = long double;
using ull = unsigned long long;
using namespace std;
template <class T>
using Ve = vector<T>;
#define ALL(v) (v).begin(), (v).end()
#define pii pair<ll, ll>
#define rep(i, a, b) for(int i = (a); i <= (b); ++i)
#define per(i, a, b) for(int i = (a); i >= (b); --i)
#define pb push_back
bool Mbe;
ll read() {
    ll x = 0, f = 1; char ch = getchar();
    while(ch < '0' || ch > '9') {if(ch == '-') f = -1; ch = getchar();}
    while(ch >= '0' && ch <= '9') x = x * 10 + ch - '0', ch = getchar();
    return x * f;
}
void write(ll x) {
    if(x < 0) putchar('-'), x = -x;
    if(x > 9) write(x / 10);
    putchar(x % 10 + '0');
}
const ll N = 2e5 + 9;
const ll Mod = 998244353, G = 3, iG = (Mod + 1) / 3;
struct poly {
    Ve<int> a;
    int size() const {return a.size();}
    void resize(int n) {a.resize(n);}
    int operator[] (int n) const {
        assert(0 <= n && n < (int)a.size());
        return a[n];
    } 
    int &operator[] (int n) {
        assert(0 <= n && n < (int)a.size());
        return a[n];
    }
};
int rev[N << 5], tw[N << 5];
int Init(int n) {
    int p = 1, c = 0;
    while(p <= n) p <<= 1, ++c;
    rev[0] = 0;
    rep(i, 1, p - 1) rev[i] = (rev[i >> 1] >> 1) | ((i & 1) << (c - 1));
    return p;
}
int pw(int x, int p) {
    int res = 1;
    while(p) {
        if(p & 1) res = 1ll * res * x % Mod;
        x = 1ll * x * x % Mod, p >>= 1;
    }
    return res;
}
int Add(int x, int y) {
    return ((x += y) >= Mod) ? (x - Mod) : x;
}
int Sub(int x, int y) {
    return ((x -= y) < 0) ? (x + Mod) : x;
}
void NTT(poly &a, int n, int sgn) {
    rep(i, 0, n - 1) {
        if(i < rev[i]) swap(a[i], a[rev[i]]);
    }
    for(int i = 1; i < n; i <<= 1) {
        int wn = pw((sgn == 1 ? G : iG), (Mod - 1) / (i << 1));
        tw[0] = 1;
        rep(j, 1, i - 1) tw[j] = 1ll * tw[j - 1] * wn % Mod;
        for(int j = 0; j < n; j += (i << 1)) {
            rep(k, 0, i - 1) {
                int x = a[j + k], y = 1ll * a[j + k + i] * tw[k] % Mod;
                a[j + k] = Add(x, y), a[j + k + i] = Sub(x, y);
            }
        }
    }
    if(~sgn) return ;
    int iv = pw(n, Mod - 2);
    rep(i, 0, n - 1) a[i] = 1ll * a[i] * iv % Mod;
    return ;
}
poly operator * (const poly &a, const poly &b) {
    ll n = a.size(), m = b.size();
    ll p = Init(n + m - 1);
    poly f = a, g = b; f.resize(p), g.resize(p);
    NTT(f, p, 1), NTT(g, p, 1);
    rep(i, 0, p - 1) f[i] = 1ll * f[i] * g[i] % Mod;
    NTT(f, p, -1), f.resize(n + m - 1);
    return f;
}
poly Inv(poly h, int n){
    if(n == 1){
    poly f;
        f.resize(1), f[0] = pw(h[0], Mod - 2);
        return f;
    }
    int n0 = (n + 1) >> 1;
    poly h0 = h; h0.resize(n0);
    poly f0 = Inv(h0, n0); f0.resize(n);
    int len = Init(n << 1);
    poly _f0 = f0; _f0.resize(len), h.resize(len);
    NTT(_f0, len, 1), NTT(h, len, 1);
    rep(i, 0, len - 1) _f0[i] = 1ll * _f0[i] * _f0[i] % Mod * h[i] % Mod;
    NTT(_f0, len, -1), _f0.resize(n);
    rep(i, 0, n - 1) f0[i] = (f0[i] * 2ll - _f0[i] + Mod) % Mod;
    return f0;
}
Ve<poly> work(const poly &P, const Ve<poly> &Q, int n) {
    int maxy = (int)Q.size() - 1;
    if(!n) {
        poly Q0; Q0.resize(maxy + 1);
        rep(i, 0, maxy) Q0[i] = Q[i][0];
        Q0 = Inv(Q0, maxy + 1);
        int lenP = P.size(); poly R = P;
        reverse(ALL(R.a));
        R = R * Q0;
        poly res; res.resize(maxy);
        rep(i, 0, (int)R.size() - 1) {
            int cy = i - (lenP - 1);
            if(cy <= 0 && cy >= -maxy + 1) res[-cy] = R[i];
        }
        Ve<poly> ret; ret.resize(maxy);
        rep(i, 0, maxy - 1) ret[i].resize(1), ret[i][0] = res[i];
        return ret;
    }
    int n2 = (n << 1) | 1;
    int len = Init(n2 * max(maxy + 1, maxy * 2) + n2 * (maxy + 1));
    poly pos, neg; pos.resize(len), neg.resize(len);
    rep(i, 0, maxy) {
        rep(j, 0, (int)Q[i].size() - 1) {
            int pwr = i * n2 + j;
            pos[pwr] = Q[i][j];
            if(j & 1) neg[pwr] = (Mod - Q[i][j]);
            else neg[pwr] = Q[i][j];
        }
    }
    NTT(neg, len, 1);
    NTT(pos, len, 1);
    rep(i, 0, len - 1) pos[i] = 1ll * pos[i] * neg[i] % Mod;
    NTT(pos, len, -1);
    Ve<poly> Q0; Q0.resize(maxy * 2 + 1);
    rep(i, 0, maxy * 2) Q0[i].resize((n >> 1) + 1);
    rep(i, 0, (int)pos.size() - 1) {
        int cy = i / n2, cx = i - cy * n2;
        if(cx & 1) continue;
        if(cx > n || cy > maxy * 2) continue;
        Q0[cy][cx >> 1] = pos[i];
    }
    Ve<poly> ret0 = work(P, Q0, n >> 1);
    int len0 = ret0.size();
    reverse(ALL(ret0));
    Init(len - 1);
    pos.a.clear(), pos.resize(len);
    rep(i, 0, len0 - 1) {
        rep(j, 0, (int)ret0[i].size() - 1) {
            int pwr = i * n2 + j * 2;
            pos[pwr] = ret0[i][j];
        }
    }
    NTT(pos, len, 1);
    rep(i, 0, len - 1) pos[i] = 1ll * pos[i] * neg[i] % Mod;
    NTT(pos, len, -1);
    Ve<poly> ret; ret.resize(maxy);
    rep(i, 0, maxy - 1) ret[i].resize(n + 1);
    rep(i, 0, (int)pos.size() - 1) {
        int cy = i / n2, cx = i - cy * n2;
        cy -= (len0 - 1);
        if(cy <= 0 && cy >= -maxy + 1) {
            if(cx <= n) ret[-cy][cx] = pos[i];
        }
    }
    return ret;
}
int n, m;
poly f, g;
bool Med;
int main() {
    cerr << fabs(&Med - &Mbe) / 1048576.0 << "MB\n";
    n = read(), m = read();
    f.resize(n + 1), g.resize(m + 1);
    rep(i, 0, n) f[i] = read();
    rep(i, 0, m) g[i] = Mod - read();
    poly con; con.resize(1), con[0] = 1;
    Ve<poly> h = work(f, {con, g}, n);
    assert(h.size() == 1);
    rep(i, 0, n) write(h[0][i]), putchar(' ');
    putchar('\n');
    cerr << "\n" << clock() * 1.0 / CLOCKS_PER_SEC * 1000 << "ms\n";
    return 0;
}

:::

References:

  1. Yasunori Kinoshita, Baitian Li: Power Series Composition in Near-Linear Time.

  2. alpha1022: 多项式不存在了:多项式复合(逆)的 O(nlog^2n) 做法.