Introduction

I wrote a bit about my experiences implementing the CDC-ACM device (USB Serial port) on a ATSAMD21 Microcontroller in my previous post. With that experience in hand I set out in writing an USB Audio device. My goal was to get something capable of streaming audio into the host computer working. So basically something that acts as a microphone. 16bit PCM audio at 48kHz seemed like a good starting point.

USB Audio Device Class has 3 major versions. Version 1.0 relased in 1998 is the earliest and most commonly used today. Even though the specification it self is more than 2 decades old it is capable of 16/32bit audio up to 96kHz sampling rates, which is more than sufficient for my applications. Since this is so widespread almost every OS out there ships with a driver so if I can get our hardware compliant to Audio Device Class 1.0, I would have something that works across platforms without ever having to worry about drivers.

Descriptors

There is a great microphone example in the specification document for USB Audio Device Class 1.0. It lists all the necessary descriptors with explanations of every field. I started by following this example.

I will give a brief note note on each descriptor, but you can refer to the specification document for more detailed explanations. ‘

Device

Device descriptor is the root descriptor, so to speak. It contains the most basic information about the device connected, including the supported USB version, USB VID/PID pair, number of possible configurations. There are 3 fields in the device descriptor named bDeviceClass, bDeviceSubClass and bDeviceProtocol. These describe the device class as assigned by the USB-IF. Operating systems typically use these values (among other things) to load the appropriate device driver. Class code 0x01 is used for Audio, but if you look at the table it mentions that this class code can only be used in the interface descriptors. So we are going to leave this value triplet as 0x00 and indicate the class codes in the following interface descriptors.

Configuration

Configuration descriptor specifies how many interfaces the device has, whether the device is powered from USB or self powered, maximum current consumption if bus powered and some other information about the configuration. A device can have multiple configurations and switch between them, but a single configuration is sufficient for this simple device. The rest of the descriptors follow the configuration descriptor.

Interface (Audio Control)

This is the first interface of this device. It is used for controlling the audio stream. Even though the audio control has its own interface, there’s no endpoint for communication. So all Audio Control requests go through the default endpoint. The requests will be discussed later in this writeup.

This is the first class specific descriptor in this configuration. This contains extra information about the audio control interface. Specifically, it contains the version of Audio Device Class spec the device complies to, total size of the class specific descriptors, number of streaming intterfaces and the identification numbers of those interfaces.

Before going into the rest of the descriptors I will try and describe the structure of the audio class. An Audio Function (Audio input, audio output, etc) are divided into logical entitites of two types, Units or Terminals. Each entity has a numeric ID, and the connections between the entitites can be described using these IDs.

In a simplified sense, terminals are inputs/outputs for audio data from the viewpoint of the Audio Function (Not the device, not the computer but the concept of this audio function that’s residing in the device). So in the case of this microphone, the actual microphone (or the analog to digital converter) is represented by an input terminal (ID 1), while the USB pipe that’s streaming data out from the device into the comptuer is represented by an output terminal (ID 2).

Units are various other control elements that add capabilities to the device. Some examples include Mixer Unit (Mixes audio channels), Selector Unit (Selects from a set of audio channel clusters), Feature Unit (Adds basic controls like Volume, Mute, Equalizer to an audio stream). The microphone example given in the USB ADC Specification doesn’t include any of these units.

Input Terminal

This specifies the properties of an intput terminal; in the case of this device, a microphone. In this example this terminal has been given the ID of 1, and identifies as a mono microphone.

Output Terminal

This terminal is described as a USB Streaming terminal (Since this is the connection out of the audio function into the host comptuer), has the terminalID 2, and a sourceID 1. This means the audio data is coming from the input terminal with ID 1, straight to this terminal. Once again, this is just a logical representation of the behaviour of the audio device, the actual implementation is upto the firmware.

That concludes the descriptors under audio control interface. Next we have the descriptors related to the acutal transfer of audio data.

Audio Streaming Interface

Audio streaming interface has two alternate settings, one of which has no actual endpoints associated with it. The purpose of this two alternate settings is to give the operating system of freeing up the USB bandwidth used by the device without having to disconnect it. Actually it’s possible to have multiple alternate settings with varying bandwidth requirements.

Alternate Setting 0 - This is a standard interface descriptor. with the AUDIO Class and AUDIO_STREAMING Subclass. There’s no further descriptors for this interface. Alternate Setting 1 - This one bears the same fields as the previous one, with the exception of bNumEndpoints which is set to 1. Since this alternate setting has an endpoint that is used for streaming, serveral descriptors follow this one.

General Descriptor

This is an Audio class-specific descirptor containing some information about the streaming interface.It has field for delay of this particular stream, and wFormatTag which specifies the data format used. PCM, which is used here is a very common and easy to implement format.

Type 1 Format Descriptor