Packetry: Building a High Performance Protocol Analysis Tool
In a previous update, we introduced our work on Packetry, our new front-end software for using Cynthion to capture and analyze USB traffic on the wire. In this update, we’re going to talk a bit more about the design of that software and explain some of the work we’re doing to make it as fast and easy to use as possible.
The Need for Speed
One of the most exciting features of Cynthion is its ability to serve as a protocol analyzer, letting you intercept and capture USB traffic on the wire, in the field, from real devices.
Fully benefiting from this feature requires the help of some very efficient software on the capture host. A high-speed USB capture, including metadata, can generate over half a gigabit per second of raw data. To see what’s happening in real time as the capture progresses, all that data needs to be processed at least as fast as it’s captured, which is close to 50 megabytes per second.
In fact, real-time performance isn’t really enough in practice. When opening a saved capture from a file, the software needs to process packets many times faster than real time. At real-time decoding speed, a busy minute-long capture would also take a whole minute to load. So we need a solution that can process USB packet data many times faster than real-time speeds, which means a throughput of hundreds of megabytes per second at least.
Because of these requirements, we’ve designed Packetry from the ground up to achieve the fastest possible decoding speeds and to scale seamlessly to captures of unlimited size, bounded only by disk space. We’re also thinking ahead: Packetry has been developed for USB 2.0 analysis with Cynthion, but in the future we may want to use it to analyze higher speed protocols.
All these factors have made performance critical to success: so how did we achieve it?
Laziness as a Virtue
To achieve the speed and scalability we need, we must do the minimum work necessary for each packet at capture time. We don’t need to fully interpret all traffic as it’s captured: we just need to follow each packet’s effect on the protocol and store things in a form we can efficiently access later.
Rather than constructing individual data structures to represent packets, transactions and other protocol elements, we simply write all captured packets out into a flat bytestream in order. In this form, the capture can be very efficiently written to a file as it progresses.
As we do so, we build a set of indexes which describe where to find the start of each packet, the start of each transaction, and so forth. Those indexes are how we describe the protocol structure. They are designed to provide just enough information for the UI to later look up any part of the capture, decode it on demand, and display it.
Each index is a monotonically increasing sequence of integers. Because of that, we can store them very efficiently. We use a simple compression scheme to minimize the storage overhead of the indexes. To ensure scalability, these indexes are also streamed to files themselves. As such, capture size is limited only by available storage, never by available memory. All data structures in memory have a bounded size which is independent of the size of the capture.
In this approach, we gain scalability by giving up a little speed at display time. This is a good tradeoff to make, because, unlike capturing or loading data which may happen at extremely high speeds, displaying data is constrained by the bandwidth of the human user. There can only be so many things on screen at once, and the user can only browse through them at human speeds. We do not have to make rendering the display very fast for it to feel instantaneous.
Traffic Display
USB is a highly structured protocol: packets are grouped into transactions, transactions into transfers, and transfers are attached to specific endpoints on devices. Our UI displays traffic hierarchically according to that structure, making captures easy to understand and explore. A similar design approach was pioneered in ViewSB, but in Packetry we’ve now made it fast and scalable to large high-speed captures.
Our GUI has been built on GTK 4, which has built-in support for displaying large lists and trees by lazily loading only the parts currently visible on screen, recycling UI widgets for efficiency, and preloading ahead of scrolling. When you scroll through the traffic view in Packetry, the packets required are loaded on demand using the capture indexes, decoded on the fly, and used to generate the summaries you see of packets, transactions and transfers. All this happens live, too: if you’re running a capture, you’ll see new traffic appear, and the descriptions of existing items may be updated as further packets come in. When you load a capture from a file, you can start exploring it immediately, even while the packets later in the file are still being indexed.
Threading Model
It’s been a while since individual CPU cores got significantly faster; these days performance gains usually come from parallelization to take advantage of multiple cores. However, some tasks just can’t be parallelized, and the only option is to make them as fast as possible on a single thread.
When analyzing packets captured on the wire, every packet matters to the overall state of the protocol. The need to deal with invalid packets and protocol errors means it’s not possible to make assumptions about structure. Interpreting traffic correctly requires looking at every packet one by one, in order, and updating the protocol state at each step. That means our packet decoder has to run as a single thread and be highly optimized for throughput.
We can, however, move everything else out to separate threads so that the core decoder can run as fast as possible. Packetry runs as three threads, each feeding data to the next:
The capture thread deals with streaming captured packets from a Cynthion device over USB.
The decoder thread processes captured packets, stores them and builds the capture indexes.
The UI thread runs the user interface, reading from the indexes and stored packets to display the captured traffic in a human-readable view.
A key feature of the design is that all the interactions between these three threads are lock-free: they cannot block each other. The capture thread supplies packets to the decoder thread through a FIFO queue. The decoder and UI threads use our own lock-free stream implementation, which allows a single writer and any number of reader threads to efficiently share a growing data stream whilst it is being buffered and written out to storage, then mapped back into memory as needed.
Keeping these threads decoupled from each other helps us ensure that capture will always continue to run at a consistent throughput, no matter how complex the the traffic is or what analysis is being done in the UI.
Head to Head
So how fast is it? To give a quick illustration, here’s Packetry and Wireshark loading the same file, side by side. The file is a 300MB capture in pcap format of a HackRF in use.
Packetry finishes loading the file 10x faster — but you don’t even need to wait for that to happen to start exploring the capture. The view is ready to interact with as soon as the first packets are decoded. That happens almost instantly, and you’re up and running immediately with a fully hierarchical, human-readable view of the traffic.
Trying it out
Packetry is still in active development and we don’t have releases or binary downloads yet. If you’re really keen to play with Packetry you can build it from source and try it out with the example capture files that are included in the repository for testing. Build instructions are in the repository.
Pseudo-Doppler Redux, ShmooCon 2018
Back in 2018 Michael Ossmann teamed up with Schuyler St. Leger at ShmooCon to present “Pseudo-Doppler Redux”; a talk about taking a modern approach to the implementation of pseudo-doppler direction finding (DF) with Software Defined Radio (SDR). This presentation demonstrates what pseudo-doppler direction finding is and gives an example of Opera Cake usage.
We hope you enjoy watching the presentation!
Talking about GreatFET with Limor "Ladyada" Fried at Adafruit, 2019
In this interview, Michael Ossmann visits Adafruit in New York and chats with Limor “Ladyada” Fried about GreatFET and HackRF. The two talk about what GreatFET neighbours are, how to design GreatFET neighbours, and Mike demonstrates how to use a wiggler to separate neighbours from a GreatFET. This is followed up with a short discussion on HackRF and Portapack and how they work together.
Reverse Engineering Black Box Systems with GreatFET, Troopers 2018
In this presentation at Troopers 2018, Kate Temkin and Dominic Spill used GreatFET One and the Facedancer software framework to demonstrate techniques for reverse engineering embedded USB hosts.
It is often fairly simple to set up an environment for reversing a USB device; you just plug it into a host that you control. Then you can manipulate software on the host to test or monitor USB communications between the host and device. Even if the host operating system doesn’t provide a way for you to monitor USB (hint: it probably does), you can run it inside a virtual machine that runs on top of Linux and use Linux’s usbmon capability.
But how do you sniff USB if the USB host is an embedded platform that you don’t control? What if it is a game console or a photocopier with software that you can’t run in a virtual machine? Kate and Dominic show how you can use GreatFET One and a laptop to proxy USB between a device and a host without controlling software on either the device or the host. With the USBProxy solution they implemented in Facedancer, it is possible not only to monitor USB communication but also to modify USB data in transit.
Additionally they demonstrate how the Facedancer software for GreatFET can be used to emulate a USB device, allowing them to reverse engineer “black box” USB hosts and test them for vulnerabilities.
Making USB Accessible, Teardown 2019
On Sunday, Kate Temkin and Mikaela Szekely presented Making USB Accessible: Developing Ultra-low-cost, Open USB Tools at Teardown 2019 in Portland. In this well-received talk, they debuted ViewSB, a USB analyzer that supports various capture backends including GreatFET, OpenVizsla, and usbmon.
In the days leading up to the talk, Kate went on a tear, developing ViewSB to complement the hardware solutions for USB capture that she and Mikaela had been working on. I asked, “Why do we need ViewSB when we already have tools such as PulseView and Wireshark?”
Her answer was that the existing open source software tools for USB analysis don’t present data in a way that is useful enough for USB developers. I recalled my past confusion about USB nomenclature and how the most essential thing I learned from Kate’s training class at hardwaresecurity.training last year had been an understanding of the differences between USB packets, transactions, and transfers. Thinking back to the tools we used in that class, I realized that she was right that a new tool was needed. In fact, the limitations of the existing tools were probably largely responsible for my confusion!
As you can see in this video, ViewSB presents low level USB packet data in a visual format that groups packets together into transactions, something that I had previously seen only in software for proprietary USB analyzers. It makes USB much easier to understand. I wholeheartedly agree with Mikaela and Kate that their work makes USB accessible!
Code used in the presentation can be found in the usb-tools organization on GitHub.
GreatFET on Hak5
I recently sat down with Darren Kitchen to record a couple Hak5 episodes. First we introduced GreatFET One to his viewers and demonstrated using its Facedancer capability to emulate a USB device. Then we did some infrared hacking with Gladiolus, a prototype GreatFET neighbor we plan to release later this year. Thanks for having me on the show, Darren!
Defeating Spread Spectrum Communication with Software Defined Radio, ToorCon 2013
Fortunately in this video you can't hear the jackhammers at work in the hotel lobby while I gave this presentation at the ToorCon San Diego seminars in October, 2013. Apart from having to talk over the construction noise, it was great to share SDR techniques that can be used to point out flaws in security claims made about spread spectrum communication technologies.
One of the things I showed in the talk was how Direct Sequence Spread Spectrum (DSSS) communications can be reverse engineered. I used SPOT Connect, a device operating on the GlobalStar satellite network as an example. A couple years later, Colby Moore did a more complete job of showing how the GlobalStar system can be attacked.
If you aren't familiar with the Pastor Manul Laphroaig, mentioned at the beginning of this talk, check out our PoC||GTFO mirror.
Rapid Radio Reversing, ToorCon 2015
In this video of Michael Ossmann’s presentation at ToorCon 2015, he demonstrates how helpful it can be to use a combination of both SDR and non-SDR tools for reverse engineering wireless systems. Michael uses both HackRF One and YARD Stick One to reverse engineer a wireless cabinet lock.
You can download and watch the video on Internet Archive here.
The code from the presentation is in Michael Ossmann’s stealthlock repository.