Language
 
Sentences
Source Words
Khmer
65,113
1,511,950
21,560,446
21,565,078
65,113
1,511,950
Burmese
31,374
661,577
40,590,354
40,595,755
31,374
661,577
Nepali
92,084
2,941,031
36,454,553
36,466,101
92,084
2,941,031
Pashto
26,321
692,651
2,587,950
2,593,163
26,321
692,651
Singhalese
217,407
5,791,982
38,720,907
38,724,422
217,407
5,791,982
Somali
14,879
506,201
28,387,922
28,396,227
14,879
506,201
Swahili
132,517
3,696,543
84,605,506
84,605,506
132,517
3,696,543
Tagalog
248,684
6,327,801
108,260,601
108,260,601
248,684
6,327,801
Bonus Release - Last Updates on Sep 2021
Polish-Czech  
New
24,001,403
288,826,678
24,001,403
288,826,678
6,055,618,075
28,559,061,699
Ukrainian  
New
13,354,365
505,831,880
13,354,365
505,831,880
235,700,383
5,832,658,894
Chinese  
New
14,170,585
217,604,664
14,170,585
217,604,664
1,207,487,761
8,953,713,029
Russian
5,377,911
101,312,142
5,377,911
101,312,142
491,941,804
492,260,972
English-Korean
4,002,441
61,963,744
4,002,441
61,963,744
Dutch-French
2,687,331
60,504,313
2,687,331
60,504,313
38,164,560
770,141,393
Polish-German
916,522
18,883,576
916,522
18,883,576
11,060,105
202,765,359
Two formats of BiCleaner files are provided. You can download either the TMX format or plaint TXT format. To effectively transform a TMX to a tab-separated text file Download TMXT tool.