Table of contents:
JACK is a low-latency audio server. Audio application (JACK client) can connect to the server and share audio with other applications. The server can, of course, send the audio produced by application to soundcard and vice-versa.
For more information about JACK, see:
Jack4j is Java™ API that allows access to the server. The API attempts to be very similar to JACK native API (written in C language), although, of course, some differences are dictated by the specifics of Java™ language and programming conventions.
For detailed information about Jack4j, see:
Some audio applications produce sound (for example, synthesizers), others receive sound (for example, recorders). Some applications do both (for example, sound effects). Some applications even don't do either of this, but they provide some additional services for others (for example, they control timing for audio playback).
From JACK point of view, these kinds of applications differ only in the number of their input and output audio ports. Each application can create any number of these. Each port has an unique name. Apart from audio ports, there can also be other types of ports, such as MIDI (more on that later).
The pair of ports can be connected together; one of the connected ports (source) must be an output port, the other one (target) must be an input port. Of course, the ports don't need to belong to the same application (but they can). This way, you can connect software synthesizer to an audio effect, for example.
It's possible to connect output port to multiple input ports (all targets will receive the sound), or multiple output ports to single input port (the target will receive mixed sound, i.e. individual samples will be summed up).
Soundcards and similar hardware have their ports too. In fact, they look like any other client (except that ports have the "IsPhysical" flag set). Thus, connecting the application to the soundcard output is done in the exactly same way as connecting it to another application. This means flexibility, because applications don't have to care where their output goes, or where their input came from.
Note that all audio ports are monophonic only (if you want stereo sound, create two ports and name them "left" and "right", for example). Also, all audio ports have the same audio format - uncompressed, each sample is represented by native float (furthermore, Jack4j requires that this native float type is four bytes long). Sample rate is global for the whole server and all its clients. While this may be seen as limitation by some, it greatly simplifies the JACK API and the clients.
The really important part of the JACK atchitecture is the processing model, though, because it allows for audio processing with little or none unwanted latency. In JACK, all audio processing is done in process cycles . The JACK server invokes the process cycle periodically (usually, many times per second). During each process cycle, each JACK client is asked exactly once to process certain number of audio samples. The number of samples requested is called buffer size , and the value (global for the whole server and all clients) usually ranges from a few tens to a few hundreds.
Imagine that you have some software synthesizer, sofware sound effect, and a soundcard. You connect synthesizer's output ports to sound effect's input ports, and sound effect's output ports to soundcard input port. Thus, the sound flows from synthesizer to the effect, and effected sound goes to the soundcard. During each process cycle, the JACK server first asks the synthesizer to produce, say, 512 samples. Once the samples are ready, the JACK server asks the sound effect to process it. The effect consumes 512 "raw" samples and produces 512 effected samples. The JACK server then delivers this effected sound to the soundcard, which plays it.
The good thing here is that the sound is delivered to the sound card in the same process cycle, and is played "immediately". This holds true even if you add another five effects in the processing chain - there's no latency added by the JACK system. JACK server automatically calls the clients in the correct order (if possible), so that the sound isn't unnecessarily delayed until the next process cycle. Also, each client is invoked exactly once during the process cycle, thus it doesn't have to protect its input and output ports with buffers (which would otherwise add latency).
So, does this mean that no latency occurs in JACK-driven sound system? Yes and no: JACK doesn't normally add latency, but some delay may still occur as a result of "feedbacks" (imagine that the effect output would be connected back to the synthesizer; the sound would be delivered in the next process cycle, and thus it would be delayed by one length of a buffer, i.e. 512 samples), and sound applications and hardware may add unwanted latency too: effects sometimes need to collect enough samples to perform spectral analysis, soundcards do some hardware buffering etc. However, the feedback loops can't be solved by any sound subsystem (and aren't very frequent anyway) and the latency added by effects and hardware is inherent to the principles of computer-based sound processing.
All JACK clients are based on callbacks . Whenever the JACK server needs anything from the client, it calls its callback function (which is native C function; Jack4j translates this call to a Java method call).
The most important callback is a process callback . This is the function (or Jack4j client method) that is called once during each process cycle. Once called, the function should process data from client's input ports and must produce data for all output ports.
Because the method is called very often, and there are other JACK clients waiting for their turn, the method must finish very quickly and should not perform operations that are CPU intensive or could block. Using I/O operations during the process callback is not recommended (if possible, copy the data elsewhere and let other thread handle it), and use of synchronization mechanisms such as monitors or locks is generally unwise. It's also recommended to avoid memory allocations and deallocations in the process function; it's quite difficult to avoid this entirely in Java, but you should preallocate data structures whenever it's possible and reasonably practical.
Another callback that is important for Jack4j clients is the thread initialization callback . This callback is called once when the client is activated. The "default" implementation of this callback in Jack4j registers the JACK client thread (created by the lower-level JACK infrastructure) in the JVM, and makes it possible to call other Jack4j callbacks later. In practice, all Jack4j clients should register the default thread initialization callback before activation, otherwise the other callbacks can't be used at all.
There are also other callbacks, mostly informative (the client can be notified when new port is registered by some other client, whenever sample rate changes etc.). See the Javadoc for details. Some callbacks are related to JACK transport infrastructure, and we will discuss them later.
Apart from audio ports, it's also possible to create MIDI ports. They can be interconnected just as audio ports (it's even possible to connect multiple outputs to single input). The JACK server can provide MIDI ports that represent MIDI devices available from the underlying sound system (such as ALSA).
The data that are sent and received on MIDI ports are byte representations of MIDI messages. The JACK client can, during its process callback, obtain only these byte representations from input ports and send them to output ports. However, no higher-level abstraction is currently provided in JACK nor Jack4j, thus the application has to encode/decode the messages by itself.
Interesting feature of JACK MIDI support is that MIDI messages are delivered in buffers (all events that occurred during the current process cycle are delivered in a single buffer), but each message has a "timestamp", that specifies when exactly the MIDI event occurred.
Many audio applications use some concept of "timeline". For example, a sequencer contains a song that has a certain length, is positioned at a certain place in the song (e.g. the beginning of first chorus), at is either playing or is paused. If there are several such applications, they have to be somehow synchronized together. For this purpose, JACK server provides a transport .
The transport is basically a virtual "playhead". It has a position (expressed in number of frames since the beginning of a "song"), and is either moving (playing) or not. Currently, the transport can only move at constant speed - one "transport frame" per audio frame.
JACK clients can query the current position and state of the transport and can also start it, stop it, or reposition it. Timeline-aware clients usually query the transport position during their process callback and react on it.
Note that there's only one transport in the JACK server.
Also note that in Jack4j, transport-related methods are available in JackTransportClient class, which is a subclass of JackClient . Transport-aware client must extend the JackTransportClient (or AbstractJackTransportClient ).
Because the frame number is not exactly the information that is used in real world applications, it's necessary to "translate" the frame number to more meaningful format. For example, musical applications usually work with a concept of bars, divided to beats. Sequencer needs to know which bar and beat is currently playing, so it can produce the correct notes. On the other hand, video applications need to translate audio frame number to video frame number. Such translations can be quite complex (if the musical tempo and time signature is changing during the song, some computations are necessary). For this reason, the JACK server doesn't provide the "translations" directly, but leaves the hard work to specialized client, called timebase master .
Any client can become a timebase master if it wants to (but there can be only one). On the beginning of each process cycle, JACK server asks the timebase master (if there's one) to compute additional information from the current frame number. This is done by calling a special callback provided by the client.
There are several kinds of additional information sets. Currently, they are: bar/beat/tick (used in musical applications), external timecode (SMPTE), and video information (A/V ratio and video frame offset). The timebase master doesn't necessarily compute all these sets of information; for instance, TimebaseMaster example client from Jack4j package can only compute information related to bar/beat/tick (and does that in very simplistic manner).
Some clients can't respond immediately to the transport repositioning. Such clients are called "slow-sync clients". They have to register a special callback, that will be invoked whenever someone attempts to reposition the transport. The callback must indicate whether the client is ready or not; if any slow-sync client is not ready, the repositioning will be delayed to the next process cycle. The slow-sync callback is then invoked on each following process cycle, until all those clients become ready (or until slow-sync timeout expires; the default is two seconds).