Build applications that process and reason across images, audio, video, and documents using multi-modal foundation models.
A practical course artifact or project that demonstrates applied skill in this subject area.