NOIP2025 JS 选手代码字符出现频率

· · 休闲·娱乐

在刚刚过去的 NOIP2025 中,江苏选手们共提交了 2581 份代码,其中包含 3803265 个英文字符(含空格、制表符、换行)。这个数据是根据以下代码得出的:

int ch;
bool iszh;
while(1)
{
  ch=inf.get();
  if(ch==EOF) break;
  if(iszh) iszh=false;
  else if(ch>127 || ch<0) iszh=true;
  else cnt[ch]++;
}

最终,我按照降序排序输出了所有被统计字符的出现个数以及频率。结果如下:

# 1    (ASCII  32) : 1004128 counts (26.40174%)
# 2  p (ASCII 112) :  220794 counts (5.80538%)
# 3  i (ASCII 105) :  165925 counts (4.36270%)
# 4 \n (ASCII  10) :  163372 counts (4.29557%)
# 5  n (ASCII 110) :  155803 counts (4.09656%)
# 6  ; (ASCII  59) :  113112 counts (2.97408%)
# 7  t (ASCII 116) :  111518 counts (2.93216%)
# 8  e (ASCII 101) :   95697 counts (2.51618%)
# 9  s (ASCII 115) :   81185 counts (2.13461%)
#10  ) (ASCII  41) :   78105 counts (2.05363%)
#11  ( (ASCII  40) :   78099 counts (2.05347%)
#12  = (ASCII  61) :   75823 counts (1.99363%)
#13  o (ASCII 111) :   73128 counts (1.92277%)
#14  a (ASCII  97) :   72297 counts (1.90092%)
#15  r (ASCII 114) :   71173 counts (1.87137%)
#16  l (ASCII 108) :   58923 counts (1.54927%)
#17  [ (ASCII  91) :   58433 counts (1.53639%)
#18  ] (ASCII  93) :   58422 counts (1.53610%)
#19  d (ASCII 100) :   56111 counts (1.47534%)
#20  + (ASCII  43) :   55205 counts (1.45152%)
#21  c (ASCII  99) :   54276 counts (1.42709%)
#22  , (ASCII  44) :   53392 counts (1.40385%)
#23  f (ASCII 102) :   51528 counts (1.35484%)
#24  1 (ASCII  49) :   49279 counts (1.29570%)
#25  < (ASCII  60) :   48649 counts (1.27914%)
#26  m (ASCII 109) :   48008 counts (1.26228%)
#27  u (ASCII 117) :   46750 counts (1.22921%)
#28  0 (ASCII  48) :   37489 counts (0.98571%)
#29 \t (ASCII   9) :   35368 counts (0.92994%)
#30  . (ASCII  46) :   31209 counts (0.82058%)
#31  " (ASCII  34) :   30018 counts (0.78927%)
#32  { (ASCII 123) :   27683 counts (0.72787%)
#33  } (ASCII 125) :   27682 counts (0.72785%)
#34  > (ASCII  62) :   27656 counts (0.72716%)
#35  x (ASCII 120) :   25615 counts (0.67350%)
#36  - (ASCII  45) :   20986 counts (0.55179%)
#37  g (ASCII 103) :   19987 counts (0.52552%)
#38  2 (ASCII  50) :   19891 counts (0.52300%)
#39  / (ASCII  47) :   18053 counts (0.47467%)
#40  b (ASCII  98) :   16921 counts (0.44491%)
#41  h (ASCII 104) :   16540 counts (0.43489%)
#42  w (ASCII 119) :   15575 counts (0.40952%)
#43  j (ASCII 106) :   14738 counts (0.38751%)
#44  y (ASCII 121) :   14732 counts (0.38735%)
#45  v (ASCII 118) :   13549 counts (0.35625%)
#46  5 (ASCII  53) :   11341 counts (0.29819%)
#47  4 (ASCII  52) :   10279 counts (0.27027%)
#48  k (ASCII 107) :    9608 counts (0.25263%)
#49  3 (ASCII  51) :    9429 counts (0.24792%)
#50  * (ASCII  42) :    8733 counts (0.22962%)
#51  & (ASCII  38) :    8250 counts (0.21692%)
#52  _ (ASCII  95) :    7558 counts (0.19872%)
#53  ' (ASCII  39) :    7294 counts (0.19178%)
#54  8 (ASCII  56) :    7182 counts (0.18884%)
#55  % (ASCII  37) :    7068 counts (0.18584%)
#56  q (ASCII 113) :    7012 counts (0.18437%)
#57  9 (ASCII  57) :    6868 counts (0.18058%)
#58  N (ASCII  78) :    6739 counts (0.17719%)
#59  # (ASCII  35) :    6306 counts (0.16580%)
#60  : (ASCII  58) :    5223 counts (0.13733%)
#61  z (ASCII 122) :    4994 counts (0.13131%)
#62  6 (ASCII  54) :    4872 counts (0.12810%)
#63  7 (ASCII  55) :    4783 counts (0.12576%)
#64  \ (ASCII  92) :    4508 counts (0.11853%)
#65  L (ASCII  76) :    2958 counts (0.07778%)
#66  T (ASCII  84) :    2859 counts (0.07517%)
#67  M (ASCII  77) :    2731 counts (0.07181%)
#68  | (ASCII 124) :    2164 counts (0.05690%)
#69  ! (ASCII  33) :    2132 counts (0.05606%)
#70  A (ASCII  65) :    1918 counts (0.05043%)
#71  O (ASCII  79) :    1757 counts (0.04620%)
#72  C (ASCII  67) :    1751 counts (0.04604%)
#73  R (ASCII  82) :    1499 counts (0.03941%)
#74  I (ASCII  73) :    1488 counts (0.03912%)
#75  F (ASCII  70) :    1277 counts (0.03358%)
#76  P (ASCII  80) :    1167 counts (0.03068%)
#77  D (ASCII  68) :    1080 counts (0.02840%)
#78  X (ASCII  88) :    1010 counts (0.02656%)
#79  S (ASCII  83) :     970 counts (0.02550%)
#80  G (ASCII  71) :     905 counts (0.02380%)
#81  E (ASCII  69) :     832 counts (0.02188%)
#82  B (ASCII  66) :     801 counts (0.02106%)
#83  ^ (ASCII  94) :     717 counts (0.01885%)
#84  ? (ASCII  63) :     516 counts (0.01357%)
#85  Q (ASCII  81) :     326 counts (0.00857%)
#86  Y (ASCII  89) :     214 counts (0.00563%)
#87  W (ASCII  87) :     200 counts (0.00526%)
#88  V (ASCII  86) :     190 counts (0.00500%)
#89  U (ASCII  85) :     183 counts (0.00481%)
#90  K (ASCII  75) :     161 counts (0.00423%)
#91  Z (ASCII  90) :     161 counts (0.00423%)
#92  ~ (ASCII 126) :     148 counts (0.00389%)
#93  H (ASCII  72) :      82 counts (0.00216%)
#94  J (ASCII  74) :      56 counts (0.00147%)
#95  @ (ASCII  64) :      36 counts (0.00095%)
#96  $ (ASCII  36) :       2 counts (0.00005%)
files = 2581  characters = 3803265

对比 2024 年的数据,我们发现空格仍然保持着总代码四分之一的数量,断档领先;in\n 也依然保持着比较靠前的位置,这侧面反映了 OI 选手码风的稳定。

而令人奇怪的是,字符 p 的出现频率由 2024 年的 0.9% 跃升至今年的 5.8%,经过排查,发现选手 JS-0555 在代码的末尾加入了大量的 p。这体现出代码的偶然性与不确定性。

三对括号 ()[]{} 各自的出现频率仍然十分接近。

对于组成标识符字符来说,整体上呈现出小写字母 > 数字 > 大写字母的趋势。其中,大写字母 N 由于经常作为描述数组长度的常量而取得了大写字母排行第一的位置。

代码中出现的两个 $ 可能是某位选手在代码中插入了 Latex 语法。

有趣的是,今年没有任何一位选手在代码中写下了 `,可喜可贺!同时,字符 KZ 的出现数量相等,可喜可贺!