P9206 Immortal "Xu Fu Spacetime"
Background
Xu Fu was a local alchemist (fangshi, 方士) from the Qi region during the Qin dynasty.
By order of the First Emperor of Qin, Xu Fu led three thousand young boys and girls on a journey to find the legendary "Penglai medicine". But in the end, he never returned.
Where did Xu Fu finally go? Did he find the Penglai medicine? These questions are no longer important.
Description
The passage of time can be abstracted as the time spent sorting a number sequence. Different sorting strategies take different amounts of time. Here we introduce a sorting algorithm that was a milestone in human exploration of sorting: Shell sort.
Shell sort can be seen as an optimization of insertion sort. To study the running efficiency of Shell sort, we want you to implement a simple Shell sort process. Before that, we will standardize the exact procedure of insertion sort and how to measure the "cost" of an insertion sort process.
### Insertion Sort
For an array $a=[a_1,a_2,\cdots,a_n]$ of length $n$, the idea of insertion sort is to scan each element from left to right and insert it into the correct position:

The figure shows a typical insertion sort process. In round $i$, we insert the element at index $i$ **before the first element in the sorted part that is greater than $\bm{a_i}$**. Suppose $a_i$ is finally inserted into position $b_i$. Then we define the cost of this round as $|a_i-b_i|+1$. The cost of the whole insertion sort process is the sum of the costs of all rounds.
### Shell Sort
To reduce the cost of insertion sort, we introduce Shell sort. Shell sort splits the whole sorting process into several rounds. In each round, it groups elements by a fixed gap and sorts the elements within each group separately. In the last round, Shell sort performs a final insertion sort on the entire array.
The grouping method is as follows. Choose an integer $d$, and divide the indices into the following groups:
- Elements with indices $1,1+d,1+2d,\cdots$.
- Elements with indices $2,2+d,2+2d,\cdots$.
- Elements with indices $3,3+d,3+2d,\cdots$.
- ……
- Elements with indices $d,2d,3d,\cdots$.
Below is the process of **one round** of Shell sort. We choose $d=3$.

In each round, Shell sort chooses a value of $d$, and in the last round it uses $d=1$. After performing such sorting in every round, it finally obtains a sorted array.

Although it looks like many insertion sorts are performed, the total cost of all insertion sorts across all rounds may be much smaller than the cost of performing insertion sort once on the whole array (this is not very obvious in the example above; you can refer to the examples in Samples $2,3$).
In fact, Shell sort is the first sorting algorithm discovered by humans whose worst-case complexity is lower than $\Theta (n^2)$. For example, when choosing $d=2^k-1,\ k=\lfloor\log_2 n\rfloor,\lfloor\log_2 n\rfloor-1,\lfloor\log_2 n\rfloor-2,\cdots,1$, the worst-case time complexity of the whole process is $\mathcal \Theta(n^{3/2})$.
Input Format
The first line contains two integers $n,m$, representing the length of the array to be sorted and the number of rounds of Shell sort.
The second line contains $n$ integers $a_1,a_2,\cdots,a_n$, describing the elements in the array.
The third line contains $m$ integers $d_1,d_2,\cdots,d_m$, describing the value of $d$ chosen in each round. The testdata guarantees that $d_m=1$.
Output Format
The first line outputs an integer $\mathrm{cost}$, which is the sum of the costs produced by insertion sort during the whole Shell sort process.
The second line outputs $n$ integers, representing the array $a$ after sorting.
Explanation/Hint
### Constraints and Notes
For all testdata, it is guaranteed that $1\le n\le 10^5$, $1\le m\le 100$, $\mathrm{cost}\le 5\times 10^7$, $1\le a_i\le 10^9$, $1\le d_i\le n$, and $d_m=1$.
Translated by ChatGPT 5