ParaCrawl Corpus release v5.1
The corpus is released as part of the ParaCrawl project co-financed by the European Union through the Connecting Europe Facility. Version 5.1 builds upon the same raw corpus as V5. Thanks to improvements in filtering procedure, the official subset extracted as version 5.1 is now higher in quantity for almost all language pairs (but ga, de, sl and et). Quality measured extrinsically through MT for several language pairs shows also improvement in quality.