NOIP2025 JS 选手代码字符出现频率
__Silvefish__ · · 休闲·娱乐
在刚刚过去的 NOIP2025 中,江苏选手们共提交了 2581 份代码,其中包含 3803265 个英文字符(含空格、制表符、换行)。这个数据是根据以下代码得出的:
int ch;
bool iszh;
while(1)
{
ch=inf.get();
if(ch==EOF) break;
if(iszh) iszh=false;
else if(ch>127 || ch<0) iszh=true;
else cnt[ch]++;
}
最终,我按照降序排序输出了所有被统计字符的出现个数以及频率。结果如下:
# 1 (ASCII 32) : 1004128 counts (26.40174%)
# 2 p (ASCII 112) : 220794 counts (5.80538%)
# 3 i (ASCII 105) : 165925 counts (4.36270%)
# 4 \n (ASCII 10) : 163372 counts (4.29557%)
# 5 n (ASCII 110) : 155803 counts (4.09656%)
# 6 ; (ASCII 59) : 113112 counts (2.97408%)
# 7 t (ASCII 116) : 111518 counts (2.93216%)
# 8 e (ASCII 101) : 95697 counts (2.51618%)
# 9 s (ASCII 115) : 81185 counts (2.13461%)
#10 ) (ASCII 41) : 78105 counts (2.05363%)
#11 ( (ASCII 40) : 78099 counts (2.05347%)
#12 = (ASCII 61) : 75823 counts (1.99363%)
#13 o (ASCII 111) : 73128 counts (1.92277%)
#14 a (ASCII 97) : 72297 counts (1.90092%)
#15 r (ASCII 114) : 71173 counts (1.87137%)
#16 l (ASCII 108) : 58923 counts (1.54927%)
#17 [ (ASCII 91) : 58433 counts (1.53639%)
#18 ] (ASCII 93) : 58422 counts (1.53610%)
#19 d (ASCII 100) : 56111 counts (1.47534%)
#20 + (ASCII 43) : 55205 counts (1.45152%)
#21 c (ASCII 99) : 54276 counts (1.42709%)
#22 , (ASCII 44) : 53392 counts (1.40385%)
#23 f (ASCII 102) : 51528 counts (1.35484%)
#24 1 (ASCII 49) : 49279 counts (1.29570%)
#25 < (ASCII 60) : 48649 counts (1.27914%)
#26 m (ASCII 109) : 48008 counts (1.26228%)
#27 u (ASCII 117) : 46750 counts (1.22921%)
#28 0 (ASCII 48) : 37489 counts (0.98571%)
#29 \t (ASCII 9) : 35368 counts (0.92994%)
#30 . (ASCII 46) : 31209 counts (0.82058%)
#31 " (ASCII 34) : 30018 counts (0.78927%)
#32 { (ASCII 123) : 27683 counts (0.72787%)
#33 } (ASCII 125) : 27682 counts (0.72785%)
#34 > (ASCII 62) : 27656 counts (0.72716%)
#35 x (ASCII 120) : 25615 counts (0.67350%)
#36 - (ASCII 45) : 20986 counts (0.55179%)
#37 g (ASCII 103) : 19987 counts (0.52552%)
#38 2 (ASCII 50) : 19891 counts (0.52300%)
#39 / (ASCII 47) : 18053 counts (0.47467%)
#40 b (ASCII 98) : 16921 counts (0.44491%)
#41 h (ASCII 104) : 16540 counts (0.43489%)
#42 w (ASCII 119) : 15575 counts (0.40952%)
#43 j (ASCII 106) : 14738 counts (0.38751%)
#44 y (ASCII 121) : 14732 counts (0.38735%)
#45 v (ASCII 118) : 13549 counts (0.35625%)
#46 5 (ASCII 53) : 11341 counts (0.29819%)
#47 4 (ASCII 52) : 10279 counts (0.27027%)
#48 k (ASCII 107) : 9608 counts (0.25263%)
#49 3 (ASCII 51) : 9429 counts (0.24792%)
#50 * (ASCII 42) : 8733 counts (0.22962%)
#51 & (ASCII 38) : 8250 counts (0.21692%)
#52 _ (ASCII 95) : 7558 counts (0.19872%)
#53 ' (ASCII 39) : 7294 counts (0.19178%)
#54 8 (ASCII 56) : 7182 counts (0.18884%)
#55 % (ASCII 37) : 7068 counts (0.18584%)
#56 q (ASCII 113) : 7012 counts (0.18437%)
#57 9 (ASCII 57) : 6868 counts (0.18058%)
#58 N (ASCII 78) : 6739 counts (0.17719%)
#59 # (ASCII 35) : 6306 counts (0.16580%)
#60 : (ASCII 58) : 5223 counts (0.13733%)
#61 z (ASCII 122) : 4994 counts (0.13131%)
#62 6 (ASCII 54) : 4872 counts (0.12810%)
#63 7 (ASCII 55) : 4783 counts (0.12576%)
#64 \ (ASCII 92) : 4508 counts (0.11853%)
#65 L (ASCII 76) : 2958 counts (0.07778%)
#66 T (ASCII 84) : 2859 counts (0.07517%)
#67 M (ASCII 77) : 2731 counts (0.07181%)
#68 | (ASCII 124) : 2164 counts (0.05690%)
#69 ! (ASCII 33) : 2132 counts (0.05606%)
#70 A (ASCII 65) : 1918 counts (0.05043%)
#71 O (ASCII 79) : 1757 counts (0.04620%)
#72 C (ASCII 67) : 1751 counts (0.04604%)
#73 R (ASCII 82) : 1499 counts (0.03941%)
#74 I (ASCII 73) : 1488 counts (0.03912%)
#75 F (ASCII 70) : 1277 counts (0.03358%)
#76 P (ASCII 80) : 1167 counts (0.03068%)
#77 D (ASCII 68) : 1080 counts (0.02840%)
#78 X (ASCII 88) : 1010 counts (0.02656%)
#79 S (ASCII 83) : 970 counts (0.02550%)
#80 G (ASCII 71) : 905 counts (0.02380%)
#81 E (ASCII 69) : 832 counts (0.02188%)
#82 B (ASCII 66) : 801 counts (0.02106%)
#83 ^ (ASCII 94) : 717 counts (0.01885%)
#84 ? (ASCII 63) : 516 counts (0.01357%)
#85 Q (ASCII 81) : 326 counts (0.00857%)
#86 Y (ASCII 89) : 214 counts (0.00563%)
#87 W (ASCII 87) : 200 counts (0.00526%)
#88 V (ASCII 86) : 190 counts (0.00500%)
#89 U (ASCII 85) : 183 counts (0.00481%)
#90 K (ASCII 75) : 161 counts (0.00423%)
#91 Z (ASCII 90) : 161 counts (0.00423%)
#92 ~ (ASCII 126) : 148 counts (0.00389%)
#93 H (ASCII 72) : 82 counts (0.00216%)
#94 J (ASCII 74) : 56 counts (0.00147%)
#95 @ (ASCII 64) : 36 counts (0.00095%)
#96 $ (ASCII 36) : 2 counts (0.00005%)
files = 2581 characters = 3803265
对比 2024 年的数据,我们发现空格仍然保持着总代码四分之一的数量,断档领先;i、n、\n 也依然保持着比较靠前的位置,这侧面反映了 OI 选手码风的稳定。
而令人奇怪的是,字符 p 的出现频率由 2024 年的 0.9% 跃升至今年的 5.8%,经过排查,发现选手 JS-0555 在代码的末尾加入了大量的 p。这体现出代码的偶然性与不确定性。
三对括号 ()、[]、{} 各自的出现频率仍然十分接近。
对于组成标识符字符来说,整体上呈现出小写字母 > 数字 > 大写字母的趋势。其中,大写字母 N 由于经常作为描述数组长度的常量而取得了大写字母排行第一的位置。
代码中出现的两个 $ 可能是某位选手在代码中插入了 Latex 语法。
有趣的是,今年没有任何一位选手在代码中写下了 `,可喜可贺!同时,字符 K 和 Z 的出现数量相等,可喜可贺!