Steps to Reproduce:
Take any SVG file with the first <switch> tag appearing after $wgSVGMetadataCutoff (256kB).
Actual Results:
no translations dropdown to choose
Expected Results:
translations dropdown to choose
Steps to Reproduce:
Take any SVG file with the first <switch> tag appearing after $wgSVGMetadataCutoff (256kB).
Actual Results:
no translations dropdown to choose
Expected Results:
translations dropdown to choose
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Change $wgSVGMetadataCutoff default to 5 MiB (previously 512KiB). | mediawiki/core | master | +12 -10 |
Due to performance reasons it might be the expected result to not check large SVGs till the end for <switch-tags.
Two proposals: increase the number of bytes read or shift multilingual testing to upload time (when the file is read anyway).
In T40010, Ponor looked at 30 SVG files and stated the mean file size was 700 kB. JoKalliauer stated that only about 500 SVG files are being uploaded every day. Johannes also says that SVGs are 2.8 percent of uploads.
SVG illustrations will be placing text on top of a drawing, so most text elements will be at the end of the file.
At one point, SVG uploads were limited to 10 MB. I do not know if that limit is still in effect.
I do not know how long it takes for MW to parse an XML file.
When I've run into this problem, I've used two workarounds.
One is to add a hidden switch near the top of the file:
<switch visibility="hidden"> <text systemLanguage="en">English</text> <text systemLanguage="de">Deutsch</text> <text>English</text> </switch>
The second is to add a similar switch to the defs element:
<defs> <g id="legend"> <switch> <text systemLanguage="en">English</text> <text systemLanguage="de">Deutsch</text> <text>English</text> </switch> </g> </defs>
SVG Translate offers to translate the text, and the users if the users add a translation, then it will show up on the File page.
SVG Translate could always add such an element near the front of the file. A trick would be to set the id to an SVG Translate GUID. Then SVG translate could always add the language without offering it to the user.
I think we should increase the default limit. 512kb seems really low, when MW has hundreds of megabytes of ram. I'm not saying it should be unlimited, but 2MB sounds entirely reasonable to me.
Actually, it looks like this is using XMLReader, so memory usage should be quite low. If there was to be some sort of DOS issue, it would probably be with recursive entity expansion which would not be prevented via the cut-off. (However libxml does have better checks against this now a days).
With that in mind, i think it makes sense to increase to 5MB.
Change 1000386 had a related patch set uploaded (by Brian Wolff; author: Brian Wolff):
[mediawiki/core@master] Change $wgSVGMetadataCutoff default to 5 MiB (previously 512KiB).
It should be noted, that we actually read through the entire SVG with XMLReader with no cut-off in UploadBase::detectScriptInSvg(), so maybe we should get rid of the cut off entirely, since we do it anyways (albeit, metadata is potentially read more often than the security checks are done)
Change 1000386 merged by jenkins-bot:
[mediawiki/core@master] Change $wgSVGMetadataCutoff default to 5 MiB (previously 512KiB).
Alternatively, the database could include all the langtags discovered in at file upload, so the SVG file would not have to be reread to build a page.
For reference, this is actually how it works.
@Glrx Do you think upping the limit to 5MB is sufficient to call this bug fixed?
Some time ago, I learned that the langtags were stored in the MW database (they are a bit buried in the API). I'm not a MW expert.
Yes, 5 MB is enough to close this issue. That size is well above the typical size, and SVG files that are above 5 MB probably have other issues. I've fixed several SVG files with this problem. IIRC, the file sizes were usually less than 1 MB (it was a 256 kB limit rather than 512 kB).
The biggest file I recall is https://commons.wikimedia.org/wiki/File:2022_Russian_invasion_of_Ukraine.svg which was probably 2 MB at the time. It has now grown to 3.7 MB (apparently gaining 1.5 MB when the base map was improved in August 2023). It is a map that has such detail that it is not expected to be viewed in MW directly; users will download and view the SVG so they can pan and zoom the image.
I would still encourage that SVG Translate add a hidden switch element at the start of the SVG file, but that is a separate issue.
For reference, on commons, there are 43066 SVGs that are > 5MB out of 2 419 905 in total (1.7%)
For images where we have detected translations (However this will miss any images where this bug is present, so maybe not a useful stat) 13 out of 4771 (0.27%) are larger than 5MB. The list is below:
+-------------------------------------------------+------------+ | img_name | Size (MiB) | +-------------------------------------------------+------------+ | 1979_United_Kingdom_EU_Election.svg | 24.4278 | | Bahnstrecke_Oberhausen–Arnhem_Karte.svg | 17.7614 | | Corsica-geographic_map.svg | 14.2364 | | Geographic_map_of_Carpathian_mountains_CS.svg | 10.7202 | | Indian_General_Election_2014_by_alliance.svg | 5.0228 | | Iran-geographic_map-es.svg | 14.1260 | | Iran-geographic_map.svg | 12.7000 | | Iran-geographic_map_clean.svg | 8.6157 | | Iran_Faults_map.svg | 12.8078 | | Neubaustrecke_Rhein-Main-Rhein-Neckar_Karte.svg | 6.9020 | | Pannonian_Basin_geographic_map-es.svg | 10.2899 | | Pannonian_Basin_geographic_map.svg | 9.5815 | | İran_coğrafya_haritası.svg | 12.5943 | +-------------------------------------------------+------------+
Anyways, calling this done. If the limit is still causing problems in any significant way, people can reopen this task or make a new one.
We might have to run a forced metadata refresh on the SVGs. Otherwise I think those SVGs between old and new value require a re upload to detect that they have new metadata.
foreachwiki maintenance/refreshImageMetadata.php --mediatype=DRAWING --mime=image/svg+xml --force --throttle
Unfortunately there doesn't seem to be a way to select only svgs of a certain size, so this would reparse all svgs, which is quite a bit. I don't think that will be a problem, because relatively SVGs are a tiny set of the uploads, but it's always a bit of a gamble.