Full waveform inversion (FWI) is a central and challenging method in exploration geophysics and seismology. It involves iterative minimisation of a global criterion expressed as a term attached to the data, e.g. the misfit between the predicted and the observed seismic waveforms. Classical misfit functions are based on receiver-by-receiver signal comparaison using L2 or L1 metrics, and suffer from well-known limitations such as cycle skipping, that lead to a strong sensitivity with respect to the initial model and the observation noise. Here we explore another strategy that uses the optimal transport theory - and the associated Monge-Kantorovich metrics - in the context of FWI to define misfit (distance) functions between entire 2D seismic images, e.g. common shot-gathers sections, that exploit space-time coherence. The computation of the Monge-Kantorovich distance leads to a convex optimisation problem under linear constrains, that can be efficiently solved using an efficient proximal splitting method. The preliminary result illustrated the interesting properties of optimal transport metrics, e.g. convexity and weaker initial model sensitivity. Ongoing extensions of quadratic Monge-Kantorowich distance for signed measures and unbalanced transport are currently being investigated and may providing a practical path with important practical implications for full waveform inversion.