P10344 [THUSC 2019] PCAP File Parsing, Reading, and Writing
Background
Note: Please download the "Study Manual" mentioned in the statement from the problem attachments. Starting from this problem, the following five problems are the five subtasks of THUSC 2019 Round 3 "Network Traffic Analysis and Processing". The original contest had six subtasks; the last one was feedback and is not included here.
---
Pengpeng is a student in the Department of Computer Science at Huaqing University, and he is about to start the first semester of his sophomore year. Since he is very good at OI, he feels that the only required major course this semester, "Data Structures", is far too easy. With strong recommendations from senior student Jie Ge, he chose in advance the junior-year major course "Principles of Computer Networks". However, the big course project for this class is to implement a software router by hand, which is difficult for Pengpeng, who has no engineering experience. In order not to disappoint Jie Ge, he decided to finish this task during the summer vacation.
This problem is divided into 6 subtasks with increasing difficulty. You will learn the basic principles of networks with Pengpeng, understand how network devices work, and implement functionalities step by step, eventually completing a simplified router software. Though small, it is fully functional: it will be able to process multiple basic network protocols and take on the most fundamental mission of a router—forwarding.
In **all subtasks** of this problem, we make the following conventions:
* Your computer has 16 network interfaces, numbered from 1 to 16.
* The default routing table on your computer is empty, and it can only get routing table updates from RIP packets.
* Each interface corresponds to **one** IP address `10.2.n.1/24`, where $n$ is the interface number, ranging from 1 to 16.
* Each interface corresponds to one MAC address `98:01:29:00:00:xx`, where `xx` is the hexadecimal representation of the interface number, ranging from $01_{16}$ to $10_{16}$.
* All traffic fragments, including those read and written, are Ethernet frames (including the frame header and the checksum at the frame tail), containing complete protocol formats.
* Unless otherwise specified, the traffic fragments in the input PCAP file are arranged in chronological order (timestamps are monotonically increasing), and you only need to process them sequentially.
If you cannot understand the above conventions, please refer to the network basics part of the "Study Manual".
Description
In all problems, network traffic will be stored in PCAP file format. For the detailed format, please refer to the "Study Manual". In this problem, you need to perform basic reading, writing, and parsing of PCAP files.
In this problem, you need to traverse the traffic fragments in the input file whose timestamps are out of order, perform some operations on them, and then assemble them into PCAP format and write to the output. For file I/O methods, you may refer to the "Study Manual".
The required operations are: only keep fragments whose length is **no greater than** $1000$ (only considering the full Ethernet frame length, not including the per-packet header in PCAP format), and sort these fragments in ascending timestamp order (i.e., compare seconds first, then microseconds). You should ensure that no two fragments have the same timestamp. Note that your output file must be a valid PCAP file. To test whether the file format is valid, you can open your output file with tools such as Wireshark to check.
Different from what the manual indicates, in this problem you should directly copy the entire per-packet header of each traffic fragment, including the timestamp fields.
Input Format
The input contains at most $n$ traffic fragments, with a total size of at most $m$ bytes.
Output Format
Your output will be compared with the answer file **byte by byte**. Do not output any extra information, to avoid unnecessary loss of points.
Explanation/Hint
**[Subtasks]**
| Test Point | $n$ | $m$ |
|:--:|:--:|:--:|
| 1 | $=10^2$ | $=5\times 10^4$ |
| 2 | $=10^2$ | $=1.5\times 10^5$ |
| 3 | $=10^3$ | $=1.5\times 10^6$ |
| 4 | $=10^4$ | $=1.5\times 10^7$ |
| 5 | $=10^5$ | $=1.5\times 10^8$ |
**[Sample 1]**
See the problem attachment `1.in/ans`.
**[Sample Explanation 1]**
You can use the Wireshark software installed on your computer to open the input and output of the sample testdata.
After opening the sample input file in Wireshark, you should see that the upper part of the interface lists all Ethernet frames recorded in this PCAP file, the middle part shows the parsing of the currently selected Ethernet frame, and the lower part shows its hexadecimal data. For detailed usage of this software, besides hints given in some problems, no further help will be provided. Please get familiar with it by yourself.
The sample input file records two Ethernet frames, both being ARP requests sent from `10.2.254.100` to `10.2.12.82`. The difference between them is the length of the extra data after the ARP request. Since the first frame length is no greater than 1000, and the second frame length is greater than 1000, according to the problem statement, only the first frame needs to be output. After sorting, it is written into the sample output file. You can also open the sample output file with Wireshark to verify this.
Translated by ChatGPT 5