矩阵加速入门

qW__Wp · 2025-11-20 16:42:00 · 算法·理论

一、矩阵乘法

1.1 定义与运算

矩阵乘法常常指一般矩阵乘法。一个 n \times p 的矩阵 A 与一个 p \times m 的矩阵 B 相乘，得到一个 n \times m 的矩阵 P。具体地，有

P_{i,j} = \sum\limits_{k=1}^p A_{i,k} \times B_{k,j}

举个例子：

\begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix} \times \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}

将得到

\begin{bmatrix} 1 \times 1 + 2 \times 4 & 1 \times 2 + 2 \times 5 & 1 \times 3 + 2 \times 6 \\ 3 \times 1 + 4 \times 4 & 3 \times 2 + 4 \times 5 & 3 \times 3 + 4 \times 6 \\ 5 \times 1 + 6 \times 4 & 5 \times 2 + 6 \times 5 & 5 \times 3 + 6 \times 6 \end{bmatrix}

通俗地理解，P_{i,j} 就是 A 矩阵的第 i 行与 B 矩阵的第 j 列的对应位置乘积之和。体现在代码上，就是：

for (int i = 1; i <= n; i ++) {
    for (int j = 1; j <= m; j ++) {
        for (int k = 1; k <= p; k ++) {
            c[i][j] += a[i][k] * b[k][j];
        }
    }
}

从上面可以看出，矩阵乘法 A \times B 要求矩阵 A 的列数与矩阵 B 的行数相等。

1.2 基本性质

具有：
- 结合律：(AB)C = A(BC)。
- 分配律：A(B+C) = AB + AC、(B+C)A = BA + CA。
这两点尤其重要。
不具有：
- 往往不具有交换律：A \times B 合法，B \times A 不一定合法。即使都合法，两者不一定相等。
- 往往不具有消去律：满足 AB = AC 不一定有 B = C。满足 AB = 0 不一定有 A 或 B 为零矩阵。

1.3 单位矩阵

单位矩阵 I 是一个方阵，其主对角线上的元素均为 1，其余元素均为 0。即

I = \begin{bmatrix} 1 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix}

其性质为：任何矩阵与单位矩阵相乘都得原矩阵。

1.4 矩阵快速幂

P3390 【模板】矩阵快速幂

前置芝士：快速幂

了解了矩阵乘法，矩阵快速幂就很简单了。将矩阵封装为结构体并重载乘号，然后按照普通快速幂的方法即可，时间复杂度 O(n^2 \log k)。

::::success[Code]

#include <bits/stdc++.h>
#define int long long
//#define INF 0x3f3f3f3f
#define endl '\n'

using namespace std;

const int N = 105, Mod = 1e9 + 7;

int n, k;

struct Matrix {
    int a[N][N];
    Matrix() {
        memset(a, 0, sizeof a);
    }
    void build() {
        for (int i = 1; i <= n; i ++) a[i][i] = 1;
    }
};

void Add(int &x, int y) {
    x += y; if (x >= Mod) x -= Mod;
}

Matrix operator *(const Matrix &x, const Matrix y) {
    Matrix z;
    for (int i = 1; i <= n; i ++) {
        for (int j = 1; j <= n; j ++) {
            for (int k = 1; k <= n; k ++) {
                Add(z.a[i][j], x.a[i][k] * y.a[k][j] % Mod);
            }
        }
    }
    return z;
}

signed main() {
//  freopen("", "r", stdin);
//  freopen("", "w", stdout);
//  ios::sync_with_stdio(0); cin.tie(0);
    cin >> n >> k;
    Matrix x;
    for (int i = 1; i <= n; i ++) {
        for (int j = 1; j <= n; j ++) {
            cin >> x.a[i][j];
        }
    }
    Matrix ans; ans.build();
    while (k) {
        if (k & 1) ans = ans * x;
        x = x * x, k >>= 1;
    }
    for (int i = 1; i <= n; i ++) {
        for (int j = 1; j <= n; j ++) {
            cout << ans.a[i][j] << ' ';
        }
        cout << endl;
    }
    return 0;
}

::::

二、应用

一大堆题目来袭。

套路什么的，题目做多了就会了，这里先不讲。

2.1 加速递推式

P1962 斐波那契数列

::::info[省流]

\begin{bmatrix} F_n & F_{n-1} \\ F_{n-1} & F_{n-2} \end{bmatrix}= \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} ^ {n-1}

::::

考虑斐波那契数列相邻两项构成的矩阵 \begin{bmatrix} F_n & F_{n-1} \end{bmatrix}。由于

\begin{bmatrix} F_n & F_{n-1} \end{bmatrix} = \begin{bmatrix} F_{n-1} + F_{n-2} & F_{n-1} \end{bmatrix}

所以

\begin{bmatrix} F_n & F_{n-1} \end{bmatrix} = \begin{bmatrix} F_{n-1} & F_{n-2} \end{bmatrix} \times \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix}

于是由 \begin{bmatrix} F_2 = 1 & F_1 = 1 \end{bmatrix} 进行 n - 2 次乘的操作即可得到 \begin{bmatrix} F_n & F_{n-1} \end{bmatrix}。即：

\times \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} ^ {n-2}

用矩阵快速幂求出 A = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} ^ {n-2}，则 F_n = A_{1,1} + A_{2,1}。

::::success[Code]

#include <bits/stdc++.h>
#define ll long long
//#define INF 0x3f3f3f3f
#define endl '\n'

using namespace std;

const int Mod = 1e9 + 7;

struct Mat {
    ll a[2][2];
    Mat () { memset(a, 0, sizeof a); }
    Mat operator*(const Mat& b) const {
        Mat res;
        for (int i = 0; i < 2; i ++) {
            for (int j = 0; j < 2; j ++) {
                for (int k = 0; k < 2; k ++) {
                    res.a[i][j] = (res.a[i][j] + a[i][k] * b.a[k][j]) % Mod;
                }
            }
        }
        return res;
    }
};

int fib(ll n) {
    if (n <= 2) return 1;
    Mat ans, base;
    base.a[0][0] = p, base.a[1][0] = q, base.a[1][1] = 1;
    ans.a[0][0] = ans.a[1][1] = 1;
    n -= 2;
    while (n) {
        if (n & 1) ans = ans * base;
        base = base * base, n >>= 1;
    }
    return (ans.a[0][0] + ans.a[1][0]) % Mod;
}

signed main() {
//  freopen("", "r", stdin);
//  freopen("", "w", stdout);
//  ios::sync_with_stdio(0); cin.tie(0);
    ll n; cin >> n;
    cout << fib(n);
    return 0;
}

::::

P1306 斐波那契公约数

求 \gcd(F_n,F_m)。考察了斐波那契数列的性质 \gcd(F_n,F_m) = F_{\gcd(n,m)}。笔者数学不好，就不证了。

P1349 广义斐波那契数列

灵活变化转移矩阵。考虑

\begin{bmatrix} a_n & a_{n-1} \end{bmatrix} = \begin{bmatrix} p \times a_{n-1} + q \times a_{n-2} & a_{n-1} \end{bmatrix}

于是

\begin{bmatrix} a_n & a_{n-1} \end{bmatrix} = \begin{bmatrix} a_{n-1} & a_{n-2} \end{bmatrix} \times \begin{bmatrix} p & 1 \\ q & 0 \end{bmatrix}

P1939 矩阵加速（数列）

我们希望构造矩阵 B 使

\begin{bmatrix}a_x & a_{x-1} & a_{x-2}\end{bmatrix} = \begin{bmatrix}a_{x-1} & a_{x-2} & a_{x-3}\end{bmatrix} \times B

由于 a_x = 1 \cdot a_{x-1} + 0 \cdot a_{x-2} + 1 \cdot a_{x-3}，所以转移矩阵第一列为 \begin{bmatrix}1 \\ 0 \\ 1\end{bmatrix}。

同理求出转移矩阵第二、三列，最终得到

\begin{bmatrix} 1 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{bmatrix}

求出 B \gets B^{n-3}，答案即为 B_{1,1}+B_{2,1}+B_{3,1}。

P1397 [NOI2013] 矩阵游戏

::::info[题意]{open} 给定 n,m,a,b,c,d，对于递推式

F(i,j) = \begin{cases} 1 & i = 1 \wedge j = 1 \\ c \times F(i-1,m) + d & i > 1 \wedge j = 1 \\ a \times F(i,j-1) + b & \text{otherwise} \end{cases}

求 F(n,m)，答案对 10^9 + 7 取模。

:::: 转移可以写成矩阵的形式： $$ \begin{bmatrix} f_{i,j} & 1 \end{bmatrix} = \begin{bmatrix} f_{i,j-1} & 1 \end{bmatrix} \times \begin{bmatrix} a & 0 \\ b & 1 \end{bmatrix} $$ $$ \begin{bmatrix} f_{i,1} & 1 \end{bmatrix} = \begin{bmatrix} f_{i-1,m} & 1 \end{bmatrix} \times \begin{bmatrix} c & 0 \\ d & 1 \end{bmatrix} $$ 令 $A = \begin{bmatrix} a & 0 \\ b & 1 \end{bmatrix}$，$B = \begin{bmatrix} c & 0 \\ d & 1 \end{bmatrix}$，则答案为： $$ \begin{bmatrix} 1 & 1 \end{bmatrix} \times A^{m-1}\underbrace{(A^{m-1}B)(A^{m-1}B)\dots(A^{m-1}B)}_{n - 1 \text{ 个}} $$ 使用矩阵快速幂计算 $A^{m-1}$ 和 $(A^{m-1}B)^{n-1}$。需要使用「十进制快速幂」。 ::::warning[注意]{open} 矩阵乘法不具有交换律，故不能写作 $A^{n(m-1)} B^{n-1}$。 :::: ::::success[Code] ```cpp #include <bits/stdc++.h> #define endl '\n' //#define int long long using namespace std; typedef long long ll; const int Mod = 1e9 + 7; struct Matrix { ll a[2][2]; Matrix () { memset(a, 0, sizeof a); } Matrix operator*(const Matrix& b) const { Matrix c; c.a[0][0] = (a[0][0] * b.a[0][0] + a[0][1] * b.a[1][0]) % Mod; c.a[0][1] = (a[0][0] * b.a[0][1] + a[0][1] * b.a[1][1]) % Mod; c.a[1][0] = (a[1][0] * b.a[0][0] + a[1][1] * b.a[1][0]) % Mod; c.a[1][1] = (a[1][0] * b.a[0][1] + a[1][1] * b.a[1][1]) % Mod; return c; } }; Matrix qpow(Matrix bs, string n) { Matrix res; res.a[0][0] = res.a[1][1] = 1; for (int i = (int)n.size() - 1; i >= 0; i --) { if (n[i] - '0') { int x = n[i] - '0'; Matrix bss = bs; while (x) { if (x & 1) res = res * bss; bss = bss * bss, x >>= 1; } } int x = 9; Matrix bss = bs; while (x) { if (x & 1) bs = bs * bss; bss = bss * bss, x >>= 1; } } return res; } string change(string s) { for (int i = (int)s.size() - 1; i >= 0; i --) { if (s[i] != '0') { s[i] --; break; } else { s[i] = '9'; } } return s; } signed main() { ios::sync_with_stdio(0), cin.tie(0); string n, m; int a, b, c, d; cin >> n >> m >> a >> b >> c >> d; Matrix A, B; A.a[0][0] = a, A.a[1][0] = b, A.a[1][1] = 1; B.a[0][0] = c, B.a[1][0] = d, B.a[1][1] = 1; Matrix nA = qpow(A, change(m)); Matrix res = qpow(nA * B, change(n)) * nA; cout << (res.a[0][0] + res.a[1][0]) % Mod; return 0; } ``` :::: ### 2.2 优化动态规划 ##### [P2106 Sam 数](https://www.luogu.com.cn/problem/P2106) 考虑朴素 dp，设 $f_{i,j}$ 表示考虑前 $i$ 位，第 $i$ 位为 $j$ 的方案数。易得转移 $f_{i,j} \gets \sum\limits_{|j - k| \le 2} f_{i-1,k}$。容易得到转移矩阵 $B$，具体地，若 $f_{i-1,k}$ 可以转移到 $f_{i,j}$，则 $B_{j,i} = 1$。转移 $k-1$ 次得到 $B^{k-1}$。初始矩阵为 $\begin{bmatrix}0 & 1 & 1 & \dots & 1\end{bmatrix}$，因为不能有前导零。特别地，当 $k=1$ 时答案为 $10$。 ::::success[Code] ```cpp #include <bits/stdc++.h> #define endl '\n' using namespace std; const int Mod = 1e9 + 7; struct Matrix { long long a[10][10]; Matrix () { memset(a, 0, sizeof a); } Matrix operator*(const Matrix& b) const { Matrix c; for (int i = 0; i < 10; i ++) { for (int j = 0; j < 10; j ++) { for (int k = 0; k < 10; k ++) { c.a[i][j] += a[i][k] * b.a[k][j]; } c.a[i][j] %= Mod; } } return c; } } bs; void init() { for (int i = 0; i < 10; i ++) { for (int j = max(0, i - 2); j <= min(9, i + 2); j ++) { bs.a[j][i] = 1; } } } Matrix get(long long n, Matrix b) { Matrix res; for (int i = 0; i < 10; i ++) res.a[i][i] = 1; while (n) { if (n & 1) res = res * b; b = b * b, n >>= 1; } return res; } signed main() { // freopen("", "r", stdin); // freopen("", "w", stdout); // ios::sync_with_stdio(0), cin.tie(0); long long k; cin >> k; if (k == 1) { cout << 10; return 0; } init(); Matrix res = get(k - 1, bs); long long ans = 0; for (int i = 0; i < 10; i ++) for (int j = 1; j < 10; j ++) ans = (ans + res.a[i][j]) % Mod; cout << ans; return 0; } ``` :::: --- ##### [P3216 [HNOI2011] 数学作业](https://www.luogu.com.cn/problem/P3216) 有转移 $f_i = f_{i-1} \times 10^k + i$，其中 $k = \log_{10}i$。写成矩阵形式： $$ \begin{bmatrix} f_i & i & 1 \end{bmatrix} = \begin{bmatrix} f_{i-1} & i-1 & 1 \end{bmatrix} \times \begin{bmatrix} 10^k & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 1 & 1 \end{bmatrix} $$ 枚举 $k$ 分段做矩阵快速幂。 ::::success[Code] ```cpp #include <bits/stdc++.h> #define endl '\n' #define int long long using namespace std; int Mod; struct Matrix { int a[3][3]; Matrix () { memset(a, 0, sizeof a); } Matrix operator*(const Matrix& b) const { Matrix c; c.a[0][0] = (a[0][0] * b.a[0][0] + a[0][1] * b.a[1][0] + a[0][2] * b.a[2][0]) % Mod; c.a[0][1] = (a[0][0] * b.a[0][1] + a[0][1] * b.a[1][1] + a[0][2] * b.a[2][1]) % Mod; c.a[0][2] = (a[0][0] * b.a[0][2] + a[0][1] * b.a[1][2] + a[0][2] * b.a[2][2]) % Mod; c.a[1][0] = (a[1][0] * b.a[0][0] + a[1][1] * b.a[1][0] + a[1][2] * b.a[2][0]) % Mod; c.a[1][1] = (a[1][0] * b.a[0][1] + a[1][1] * b.a[1][1] + a[1][2] * b.a[2][1]) % Mod; c.a[1][2] = (a[1][0] * b.a[0][2] + a[1][1] * b.a[1][2] + a[1][2] * b.a[2][2]) % Mod; c.a[2][0] = (a[2][0] * b.a[0][0] + a[2][1] * b.a[1][0] + a[2][2] * b.a[2][0]) % Mod; c.a[2][1] = (a[2][0] * b.a[0][1] + a[2][1] * b.a[1][1] + a[2][2] * b.a[2][1]) % Mod; c.a[2][2] = (a[2][0] * b.a[0][2] + a[2][1] * b.a[1][2] + a[2][2] * b.a[2][2]) % Mod; return c; } }; Matrix qpow(Matrix bs, int b) { Matrix res; res.a[0][0] = res.a[1][1] = res.a[2][2] = 1; while (b) { if (b & 1) res = res * bs; bs = bs * bs, b >>= 1; } return res; } signed main() { // freopen("", "r", stdin); // freopen("", "w", stdout); ios::sync_with_stdio(0), cin.tie(0); int n; cin >> n >> Mod; long long p = 1; int now = 0; for (int i = 1; ; i ++) { Matrix b; b.a[0][0] = now, b.a[1][1] = (p - 1) % Mod, b.a[2][2] = 1; Matrix bs; bs.a[0][0] = p % Mod * 10 % Mod, bs.a[1][0] = bs.a[1][1] = bs.a[2][0] = bs.a[2][1] = bs.a[2][2] = 1; Matrix res; if (p - 1 >= n / 10 || (n % 10 != 0 && p - 1 >= n / 10)) { res = b * qpow(bs, n - p + 1); now = (res.a[0][0] + res.a[1][0] + res.a[2][0]) % Mod; break; } else { res = b * qpow(bs, p * 9); p *= 10; now = (res.a[0][0] + res.a[1][0] + res.a[2][0]) % Mod; } } cout << now; return 0; } ``` :::: --- ##### 模拟赛题，来源未知 ::::info[题意]{open} 给定 $n$ 位二进制数 $S$ 和常数 $k$，求 $$\sum\limits_{x = 0}^S [\operatorname{popcount}(x) = k] \times F_x$$ 其中 $\operatorname{popcount}(x)$ 为 $x$ 的二进制下 $1$ 的数量，$F_x$ 为斐波那契数列中的第 $x$ 项。具体地，$F_0 = F_1 = 1,F_n = F_{n-1} + F_{n-2}$。 $1 \le k \le n \le 3 \times 10 ^ 3$，保证 $S$ 无前导零。 :::: 本题考察矩阵乘法的结合律。对于 $i \in [0,n)$，预处理 $F_{2^i}$ 的转移矩阵 $G_i = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix}^{2^i} = G_{i-1}^2$ 即可快速求出 $F_x$。先考虑 $S$ 全为 $1$ 的情况。此时相当于在 $n$ 位数中选择 $k$ 位的所有方案的转移矩阵乘积之和，设 $f_{i,j}$ 表示前 $i$ 位选 $j$ 位的答案矩阵，容易写出 $O(nk)$ 的 dp。对于 $x \le S$ 的限制，类似数位 dp，在状态中加入一维 $0/1$ 表示前 $i$ 位是否紧贴上界即可。时间复杂度 $O(nk)$，带 $2^4$ 常数。 ::::success[Code] ```cpp #include <bits/stdc++.h> #define ll long long #define endl '\n' using namespace std; const int N = 2005, Mod = 1e9 + 7; char s[N]; struct Mat { ll a[2][2]; Mat () { memset(a, 0, sizeof a); } Mat operator*(const Mat& b) const { Mat c; for (int i = 0; i < 2; i ++) for (int j = 0; j < 2; j ++) c.a[i][j] = (a[i][0] * b.a[0][j] + a[i][1] * b.a[1][j]) % Mod; return c; } void operator+=(const Mat& b) { for (int i = 0; i < 2; i ++) for (int j = 0; j < 2; j ++) a[i][j] = (a[i][j] + b.a[i][j]) % Mod; } void build() { for (int i = 0; i < 2; i ++) a[i][i] = 1; } } g[N], f[2][N][2]; signed main() { ios::sync_with_stdio(0); cin.tie(0); int n, k; cin >> n >> k; for (int i = 1; i <= n; i ++) cin >> s[i]; g[n - 1].a[0][0] = g[n - 1].a[0][1] = g[n - 1].a[1][0] = 1; for (int i = n - 2; i >= 0; i --) g[i] = g[i + 1] * g[i + 1]; int now = 0, nx = 1; f[0][0][1].build(); for (int i = 0; i < n; i ++) { for (int j = 0; j <= min(i, k); j ++) { for (int l = 0; l < 2; l ++) { f[nx][j][l & (s[i + 1] == '0')] += f[now][j][l]; if (j + 1 <= k && (!l || s[i + 1] == '1')) { f[nx][j + 1][l & (s[i + 1] == '1')] += f[now][j][l] * g[i]; } f[now][j][l] = Mat(); } } swap(now, nx); } cout << (f[now][k][0].a[0][0] + f[now][k][1].a[0][0]) % Mod; return 0; } ``` :::: --- ##### [P1357 花园](https://www.luogu.com.cn/problem/P1357) 矩阵加速状压 dp。 ::::info[题意]{open} 给定 $n,m,k$，试求有多少长度为 $n$ 的**环状** $\texttt{01}$ 串满足任意长度为 $k$ 的子串中 $1$ 的数量不超过 $k$。 $2 \le n \le 10^{15}, 2 \le m \le \min(n, 5), 1 \le k < m$。 :::: $m$ 较小，考虑状压。设 $f_{i,S}$ 表示考虑前 $i$ 位，**最后 $m$ 位**的状态为 $S$ 的方案数。若可以从 $f_{i-1,S'}$ 转移而来，易得 $S'$ 的后 $m-1$ 位必须与 $S$ 的前 $m-1$ 位相同，于是有转移 $$ f_{i,S} \gets f_{i-1, S >> 1} + f_{i-1, (S >> 1) + 2^{m-1}} $$ 对于环上问题，我们要求第 $n + 1$ 位的状态与第 $1$ 位相同。即枚举初始状态 $T$，答案取 $f_{n+1,T}$。直接实现的时间复杂度为 $O(2^m n)$。 --- 注意到状态 $i$ 到状态 $j$ 是否转移是固定的，考虑构造一个 $2 ^ m \times 2 ^ m$ 的转移矩阵 $B$，使： $$ \begin{bmatrix} f_{i,S_1} & f_{i,S_2} & \dots & \end{bmatrix} = \begin{bmatrix} f_{i-1,S_1} & f_{i-1,S_2} & \dots & \end{bmatrix} \times B $$ 转移矩阵的构造方法：当状态 $j$ 可以转移到状态 $i$ 时，$B_{i,j} = 1$，否则为 $0$。由于最终状态必须与初始状态相同，答案即为 $\sum\limits_{i=1}^n B^n_{i,i}$。时间复杂度 $O(2^{3m} \log n)$。 ::::success[Code] ```cpp #include <bits/stdc++.h> #define endl '\n' using namespace std; typedef long long ll; const int N = 1e5 + 5, Mod = 1e9 + 7; int f[N][40]; int h[N][40]; ll n; int m, k; struct Matrix { int a[32][32]; Matrix () { memset(a, 0, sizeof a); } Matrix operator*(const Matrix& other) const { Matrix res; for (int i = 0; i < (1 << m); i ++) { for (int j = 0; j < (1 << m); j ++) { for (int k = 0; k < (1 << m); k ++) { res.a[i][j] = (0ll + res.a[i][j] + 1ll * a[i][k] * other.a[k][j] % Mod) % Mod; } } } return res; } } bs; signed main() { ios::sync_with_stdio(0), cin.tie(0); cin >> n >> m >> k; for (int i = 0; i < (1 << m); i ++) { if (__builtin_popcount(i) > k) continue; if (__builtin_popcount(i >> 1) <= k) bs.a[i >> 1][i] = 1; int l = (i >> 1) + (1 << (m - 1)); if (__builtin_popcount(l) <= k) bs.a[l][i] = 1; } Matrix res; for (int i = 0; i < (1 << m); i ++) res.a[i][i] = 1; ll x = n; while (x) { if (x & 1) res = res * bs; bs = bs * bs, x >>= 1; } int ans = 0; for (int i = 0; i < (1 << m); i ++) { ans = (ans + res.a[i][i]) % Mod; } cout << ans; return 0; } ``` :::: ### 2.3 优化图上问题 ##### [P2886 [USACO07NOV] Cow Relays G](https://www.luogu.com.cn/problem/P2886) ::::info[题意]{open} 给定 $m$ 条边 $(u_i, v_i, w_i)$ 的简单无向连通图，求 $s$ 到 $t$ 经过 $N$ 条边的最短路。 :::: 考虑 Floyd 的过程： ```cpp for (int k = 1; k <= n; k ++) { for (int i = 1; i <= n; i ++) { for (int j = 1; j <= n; j ++) { f[i][j] = min(f[i][j], f[i][k] + f[k][j] + w(i, j)); } } } ``` 我们这样写： ```cpp for (int k = 1; k <= n; k ++) { for (int i = 1; i <= n; i ++) { for (int j = 1; j <= n; j ++) { w[i][j] = min(w[i][j], w[i][k] + w[k][j]); } } } ``` 其中 $w_{i,j}$ 为图上边 $(i,j)$ 的边权（若不存在则为无穷大）。经过一次这样的乘法，$w_{i,j}$ 即为从 $i$ 到 $j$ 经过 $2$ 条边的最短路。推广可知，转移 $k$ 次后得到的是经过 $k + 1$ 条边的最短路。 #### 定义 **Min-plus 矩阵乘法**： $$ C_{i,j} = \min_{k=1}^n \{a_{i,k} + b_{k,j}\} $$ 得到的 $C$ 为两个 $n \times n$ 的矩阵 $A$ 与 $B$ 相乘的结果。成立由于 $\min$ 具有结合律。转移 $N - 1$ 次即得答案。使用矩阵快速幂，时间复杂度 $O(T^3 \log N)$。 ::::success[Code] ```cpp #include <bits/stdc++.h> #define endl '\n' #define INF 0x3f3f3f3f using namespace std; const int N = 205, M = 1e6 + 5; int U[M], V[M], w[M]; int id[N], cnt; int f[N][N]; int n, t; struct Matrix { int a[N][N]; Matrix () { memset(a, INF, sizeof a); } Matrix operator*(const Matrix b) const { Matrix c; for (int k = 1; k <= t; k ++) { for (int i = 1; i <= t; i ++) { for (int j = 1; j <= t; j ++) { c.a[i][j] = min(c.a[i][j], a[i][k] + b.a[k][j]); } } } return c; } } bs, ans; signed main() { // freopen("", "r", stdin); // freopen("", "w", stdout); ios::sync_with_stdio(0), cin.tie(0); int m, S, T; cin >> n >> m >> S >> T; for (int i = 1; i <= m; i ++) { cin >> w[i] >> U[i] >> V[i]; id[ ++ cnt] = U[i], id[ ++ cnt] = V[i]; } sort(id + 1, id + cnt + 1); t = unique(id + 1, id + cnt + 1) - id - 1; S = lower_bound(id + 1, id + t + 1, S) - id, T = lower_bound(id + 1, id + t + 1, T) - id; for (int i = 1; i <= m; i ++) { U[i] = lower_bound(id + 1, id + t + 1, U[i]) - id, V[i] = lower_bound(id + 1, id + t + 1, V[i]) - id; bs.a[U[i]][V[i]] = bs.a[V[i]][U[i]] = min(bs.a[U[i]][V[i]], w[i]); } Matrix ans = bs; n --; while (n) { if (n & 1) ans = ans * bs; bs = bs * bs, n >>= 1; } cout << ans.a[S][T]; return 0; } ``` :::: --- ##### [P3758 [TJOI2017] 可乐](https://www.luogu.com.cn/problem/P3758) ::::info[题意]{open} 给定 $n$ 个点 $m$ 条边的无向图和正整数 $t$，初始位于 $1$，每秒选择一种操作进行： + 前进：从 $u$ 走到与它相向的点 $v$； + 停留； + 结束。若第 $t$ 秒未结束，则在第 $t + 1$ 秒结束。问 $t$ 秒后的不同行动方案数。两种方案不同当且仅当结束时刻不同或某一时刻所在节点不同。 :::: 考虑这样一个问题：从 $s$ 出发走 $t$ 步（必须走）的方案数。设 $f_{i,j}$ 表示从 $i$ 走到 $j$ 的方案数，则转移为： $$ f_{i,j} = \sum\limits_{k=1}^n (f_{i,k} \times f_{k,j}) $$ 发现这与矩阵乘法一致，于是设 $f$ 表示走一步的方案数矩阵，$f^2$ 即可代表走两步的方案数……依此类推，$f^t$ 即为走 $t$ 步。那么答案即为 $\sum\limits_{i=1}^n f^t_{s,i}$。时间复杂度 $O(n^3 \log t)$。 --- 回到原问题，停留可以视为自环，但是结束似乎不好做。考虑对每个位置 $i$ 添加边 $i \to 0$，并在 $0$ 上连自环。由于 $0$ 没有连向其他点的边，结束操作就被转化为：走到 $0$ 并一直停留在该状态。很妙的 Trick。 ::::success[Code] ```cpp #include <bits/stdc++.h> #define endl '\n' using namespace std; typedef long long ll; const int N = 105, Mod = 2017; int n; struct Matrix { int a[N][N]; Matrix() { memset(a, 0, sizeof a); } Matrix operator*(const Matrix& b) const { Matrix c; for (int i = 0; i <= n; i ++) { for (int j = 0; j <= n; j ++) { for (int k = 0; k <= n; k ++) { c.a[i][j] += a[i][k] * b.a[k][j]; } c.a[i][j] %= Mod; } } return c; } } bs; signed main() { // freopen("", "r", stdin); // freopen("", "w", stdout); ios::sync_with_stdio(0), cin.tie(0); int m; cin >> n >> m; for (int i = 1; i <= m; i ++) { int u, v; cin >> u >> v; bs.a[u][v] = bs.a[v][u] = 1; } for (int i = 0; i <= n; i ++) bs.a[i][i] = 1, bs.a[i][0] = 1; int t; cin >> t; Matrix res; for (int i = 0; i <= n; i ++) res.a[i][i] = 1; while (t) { if (t & 1) res = res * bs; bs = bs * bs, t >>= 1; } int ans = 0; for (int i = 0; i <= n; i ++) ans = (ans + res.a[1][i]) % Mod; cout << ans; return 0; } ``` :::: --- ##### [P2233 [HNOI2002] 公交车路线](https://www.luogu.com.cn/problem/P2233) 类似上一题，考虑连边 $i \to next$、$i \to prev$，对于 $i = 5$（E）则不向外连边。由于到 $5$ 结束，转化为 $n-1$ 步走到 $4$ 或 $6$ 的方案数和。 --- **模拟赛题，来源未知** ::::info[题意]{open} 给定 $x,y$ 和 $n$ 个节点的树，并给定长度为 $n$ 的正整数序列 $a$，定义递推式： $$ f_i = \begin{cases} i & 0 \le i \le 1 \\ x \times f_{i-1} + y \times f_{i-2} & i > 1 \end{cases} $$ 对于 $1 \le i \le n$，对于其子树内的无序点对 $(x,y)$（$a_x \neq a_y$），求： $$ \sum f_{|a_x-a_y|} \bmod 998244353 $$ $1 \le n \le 10^5, 1 \le x,y,a_i < 998244353$。 :::: 考虑加入一个点 $x$ 对答案的贡献： $$ \sum_{a_y < a_x} f_{a_x - a_y} + \sum_{a_y > a_x} f_{a_y - a_x} $$ 把递推式写成矩阵形式： $$ \begin{bmatrix} f_i & f_{i-1} \end{bmatrix} = \begin{bmatrix} f_{i-1} & f_{i-2} \end{bmatrix} \times \begin{bmatrix} x & 1 \\ y & 0 \end{bmatrix} $$ 记转移矩阵为 $A$。则贡献为： $$ (\sum_{a_y < a_x} A^{a_x}) \times A^{-a_y} + (\sum_{a_y > a_x} A^y) \times A^x $$ ::::warning[注意]{open} 矩阵乘法具有分配律，不具有交换律。 :::: 容易求出 $A$ 的逆矩阵 $B = A^{-1} = \begin{bmatrix} 0 & \frac{1}{y} \\ 1 & \frac{x}{y}\end{bmatrix}$。dsu on tree 动态维护，树状数组维护 $\sum A^{x}$ 和 $\sum B^{x}$。时间复杂度 $O(n \log^2 n)$。 ::::success[Code] ```cpp #include <bits/stdc++.h> #define endl '\n' using namespace std; typedef long long ll; const int N = 5e5 + 5, Mod = 998244353; int n, t; int a[N], a1[N], id[N]; int dfn[N], siz[N], rnk[N], times, son[N]; ll ans[N], now; vector<int> G[N]; void Add(ll &x, ll v) { x += v; if (x >= Mod) x -= Mod; } void Del(ll &x, ll v) { x -= v; if (x < 0) x += Mod; } struct Matrix { ll a[2][2]; Matrix () { memset(a, 0, sizeof a); } Matrix operator*(const Matrix& b) const { Matrix c; c.a[0][0] = (a[0][0] * b.a[0][0] + a[0][1] * b.a[1][0]) % Mod; c.a[0][1] = (a[0][0] * b.a[0][1] + a[0][1] * b.a[1][1]) % Mod; c.a[1][0] = (a[1][0] * b.a[0][0] + a[1][1] * b.a[1][0]) % Mod; c.a[1][1] = (a[1][0] * b.a[0][1] + a[1][1] * b.a[1][1]) % Mod; return c; } void operator+=(const Matrix &b) { Add(a[0][0], b.a[0][0]), Add(a[0][1], b.a[0][1]), Add(a[1][0], b.a[1][0]), Add(a[1][1], b.a[1][1]); } void operator-=(const Matrix &b) { Del(a[0][0], b.a[0][0]), Del(a[0][1], b.a[0][1]), Del(a[1][0], b.a[1][0]), Del(a[1][1], b.a[1][1]); } Matrix operator+(const Matrix& b) const { Matrix c; c.a[0][0] = a[0][0], c.a[0][1] = a[0][1], c.a[1][0] = a[1][0], c.a[1][1] = a[1][1]; c += b; return c; } Matrix operator-(const Matrix& b) const { Matrix c; c.a[0][0] = a[0][0], c.a[0][1] = a[0][1], c.a[1][0] = a[1][0], c.a[1][1] = a[1][1]; c -= b; return c; } } bs, bsT, F[2][N]; Matrix get(int n, Matrix b) { Matrix res; res.a[0][0] = res.a[1][1] = 1; while (n) { if (n & 1) res = res * b; b = b * b, n >>= 1; } return res; } struct BIT { Matrix T[N]; void update(int x, int v, int f) { Matrix fst = F[f][x]; for (; x <= t; x += (x & (-x))) { if (v == 1) T[x] += fst; else T[x] -= fst; } } Matrix query(int x) { Matrix res; for (; x; x -= (x & (-x))) res += T[x]; return res; } } T1, T2; void dfs1(int u, int fa) { dfn[u] = ++ times, siz[u] = 1, rnk[times] = u; for (int i = 0; i < (int)G[u].size(); i ++) { int v = G[u][i]; if (v == fa) continue; dfs1(v, u); if (!son[u] || siz[v] > siz[son[u]]) son[u] = v; siz[u] += siz[v]; } } void add(int x) { T1.update(id[x], 1, 0), T2.update(id[x], 1, 1); Add(now, ((T1.query(t) - T1.query(id[x])) * F[1][id[x]] + F[0][id[x]] * T2.query(id[x] - 1)).a[0][1]); } void del(int x) { Del(now, ((T1.query(t) - T1.query(id[x])) * F[1][id[x]] + F[0][id[x]] * T2.query(id[x] - 1)).a[0][1]); T1.update(id[x], -1, 0), T2.update(id[x], -1, 1); } void dfs2(int u, int fa, bool keep) { for (auto v : G[u]) { if (v == fa || v == son[u]) continue; dfs2(v, u, 0); } if (son[u]) dfs2(son[u], u, 1); for (auto v : G[u]) { if (v == fa || v == son[u]) continue; for (int i = dfn[v]; i <= dfn[v] + siz[v] - 1; i ++) add(rnk[i]); } add(u); ans[u] = now; if (!keep) { for (int i = dfn[u]; i <= dfn[u] + siz[u] - 1; i ++) del(rnk[i]); } } int qpow(ll a, int b) { ll res = 1; while (b) { if (b & 1) res = res * a % Mod; a = a * a % Mod, b >>= 1; } return res; } signed main() { ios::sync_with_stdio(0), cin.tie(0); ll x, y; int sub; cin >> n >> x >> y >> sub; for (int i = 1; i <= n; i ++) cin >> a[i], a1[i] = a[i]; sort(a1 + 1, a1 + n + 1); t = unique(a1 + 1, a1 + n + 1) - a1 - 1; for (int i = 1; i <= n; i ++) id[i] = lower_bound(a1 + 1, a1 + t + 1, a[i]) - a1; bs.a[0][0] = x, bs.a[1][0] = y, bs.a[0][1] = 1; ll inv = qpow(y, Mod - 2); bsT.a[1][0] = 1, bsT.a[0][1] = inv, bsT.a[1][1] = Mod - x * inv % Mod; for (int i = 1; i <= t; i ++) { F[0][i] = get(a1[i], bs), F[1][i] = get(a1[i], bsT); } for (int i = 1; i < n; i ++) { int u, v; cin >> u >> v; G[u].push_back(v), G[v].push_back(u); } dfs1(1, 0); dfs2(1, 0, 1); for (int i = 1; i <= n; i ++) cout << ans[i] << endl; return 0; } ``` :::: （图上问题题目有待补充，[这里](https://www.luogu.com.cn/training/7403)有一些） ## 结语这篇文章只触及了矩阵加速的冰山一角，更广的应用 / 更难的题目还待读者自己探索。如发现文章有错误 / 有对本文的建议，还望指出，非常感谢！