Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-text] For most languages, hyphens:auto should not hyphenate Capitalized words #3927

Closed
jfkthame opened this issue May 13, 2019 · 43 comments
Assignees
Labels
Closed Accepted by CSSWG Resolution Commenter Satisfied Commenter has indicated satisfaction with the resolution / edits. css-text-3 Current Work css-text-4 i18n-eurlreq European language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. Testing Unnecessary Memory aid - issue doesn't require tests Tracked in DoC

Comments

@jfkthame
Copy link
Contributor

When auto-hyphenation is in use, I believe that in most languages - with German being the major exception - it would be preferable for browsers not to hyphenate capitalized words, which will often be proper nouns. In many cases authors and readers will prefer that names (of people, companies, etc) not be split, and in addition hyphenation rules designed for the "normal" words of a language may fail to hyphenate many names appropriately.

(https://bugzilla.mozilla.org/show_bug.cgi?id=1550532 was recently filed against Gecko about this issue.)

The CSS Text 3 spec explicitly does not specify exactly where hyphenation opportunities occur when hyphens:auto is used. However, I would suggest adding an informative note to the spec, suggesting that browsers may want to suppress auto-hyphenation of capitalized words except when the hyphenation language in use is German.

For CSS Text 4, perhaps a property should be introduced to allow authors to explicitly control this behavior; e.g. hyphenation-capitalized-words: auto | yes | no, where yes and no would have the obvious meaning, and auto would tell the browser to use whatever heuristics it may have, such as considering the current language.

@Crissov
Copy link
Contributor

Crissov commented May 13, 2019

name, .name, 
::proper-noun
  {hyphens: none;}

if there is sufficient markup or if there was such a semantic pseudo element.

@litherum
Copy link
Contributor

Is a new property really worth it? Is this something authors are asking to be able to control? Can the browser just do it right in the first place?

@jfkthame
Copy link
Contributor Author

Well, what's "the right thing" for a browser to do regarding hyphenation of capitalized words? I don't think there's a clear answer to that, although I do think browsers should try for a sensible default behavior, and in https://bugzilla.mozilla.org/show_bug.cgi?id=1550532 we just made the suggested adjustment for Firefox.

The problem is that in some cases authors/users may prefer that proper names not be hyphenated (as requested in the Mozilla bug); we can't reliably identify proper names in general text, but we can use capitalized words as the best available proxy for this (except in German); but this has the drawback that we'll also suppress hyphenation of non-names at the beginning of sentences; in some cases, this trade-off may be too great and it'd be preferable to allow capitalized hyphenation after all. I don't think a single hard-coded behavior will ever satisfy all use cases.

(A further refinement to the heuristic -- not yet implemented -- would be to make the behavior dependent on line width, so that as line width is reduced, constraints on what may be hyphenated are relaxed.)

Note that systems such as TeX (the \uchyph parameter) and InDesign (the "Hyphenate Capitalized Words" option in paragraph formatting) do expose this question to authors, recognizing that there is not a simple "correct" behavior that the application can universally use.

Obviously, authors can override the browser's heuristics by adding markup to individual names; the question here is what kind of default behavior, and how much author control, we can/should offer for (the overwhelming majority of) text that does not have that level of detailed markup.

@SelenIT
Copy link
Collaborator

SelenIT commented May 14, 2019

I'm not sure that not hyphenating capitalized words in English is a rule and hyphenating them in German is an exception, and not the other way around. At least, AFAIK, in Russian there is no special case for capitalized words regarding hyphenation (only abbreviations are not hyphenated). Maybe a bit more statistics is needed?

@jfkthame
Copy link
Contributor Author

I don't believe there are (in general) firm rules about this in either direction; it's a judgement call, and may depend on the specific content and the context in which it's being presented, as well as the individual preferences of the author/typographer.

As such, I think the best we can do in CSS is to offer some guidance as to good default behaviors for browsers -- and further information regarding typical usage in various languages may be helpful -- together with adequate controls so that authors can achieve the results they want.

@litherum
Copy link
Contributor

WebKit just got a bug about this too (possibly filed by the same person) https://bugs.webkit.org/show_bug.cgi?id=197889

@AmeliaBR
Copy link
Contributor

Note that systems such as TeX … and InDesign … do expose this question to authors.

This is a very good argument for adding a new property. Does anyone have other examples?

@revoltpuppy
Copy link

Hello, I’m the person filing these bugs. I appreciate the discussion. For the record, here's the bug I sent to Blink, too: https://bugs.chromium.org/p/chromium/issues/detail?id=963039&can=2&q=hyphen%20proper%20nouns

I do recognize that it will be difficult to find the perfect solution that works for everyone, but I think there can be more sensible defaults. People don’t like it when their name gets broken at the end of a line. Companies don’t like it when their own materials add hyphens into the middle of their brand names.

Hyphenation should be a progressive enhancement. Over the last 10+ years, I haven’t been able to use it in a professional setting, because I’m always asked to turn it off the instant someone sees their brand name or their own name broken across a line. That’s not an enhancement. I understand that we can turn it off on a case-by-case basis with .name or something similar, but that puts the burden on content owners to wrap every name in a span. That’s not an enhancement either.

I wonder, too, if we could add a new value to the hyphens property, all, instead of having a whole separate property. auto would be updated to hyphenate capitalized words based on language (e.g. in German, but not English) and all would hyphenate capitalized words regardless of language. Or keep auto as currently defined and add no-capitalized-words as the new value.

@AmeliaBR
Copy link
Contributor

I wonder, too, if we could add a new value to the hyphens property, all, instead of having a whole separate property.

Note that there are already multiple properties proposed for controlling hyphenation in CSS Text 4, and other open issues suggesting more control. So adding a single new keyword likely wouldn't be enough.

@fantasai
Copy link
Collaborator

fantasai commented May 15, 2019

I'm happy to add a note to CSS Text saying that UAs might want to use heuristics suppress hyphenation in proper nouns, but I don't think we should define those heuristics in the spec.

("Capitalized words except in German" might want to be "Capitalized words except in German and except after periods", or in a CSS-to-PDF renderer used in publication workflows, even "Capitalized words except in German and except after periods unless we saw it capitalized not after a period." I don't think we'll come up with the ideal heuristics here.)

@revoltpuppy
Copy link

The last one, “Capitalized words, except in German, and except after periods, unless we saw it capitalized not after a period,” is the best heuristic I’ve seen so far, and the fact that it’s used in publication workflows backs that up.

@Crissov
Copy link
Contributor

Crissov commented May 16, 2019

“Capitalized” probably meaning contains a capital letter, not begins with a capital letter to capture “iTunes” and the likes.

@jfkthame
Copy link
Contributor Author

That's a good point, although in practice I wonder how many such names are actually long enough that hyphenation rules are likely to apply to them? Current browsers don't appear to find a hyphenation opportunity in "iTunes", for example, regardless of casing.

@jfkthame
Copy link
Contributor Author

...when using English rules; however, I notice that with lang=de, we can hyphenate "iTu-nes". That's probably not ideal.

@fantasai
Copy link
Collaborator

@revoltpuppy To be clear, that was a hypothetical example. :) Not very practical for browsers, but much more practical for publication workflows.

@xfq xfq added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Jun 1, 2019
@css-meeting-bot
Copy link
Member

The CSS Working Group just discussed hyphens:auto should not hyphenate Capitalized words, and agreed to the following:

  • RESOLVED: Add A note to the spec and close with no normative change
The full IRC log of that discussion <Rossen_> Topic: hyphens:auto should not hyphenate Capitalized words
<Rossen_> github: https://github.com//issues/3927
<una> florian: so the issue being raised is that in some langs, when words are capitalized you should hyphenate and in some they should not
<una> ... we should bake this into the spec
<una> ... i'd like to close this as wontfix or rejected bc we already say this is dict based within the logic of the lang-based resource
<dauwhe> q+
<una> fantasai: I would go a little farther and say that we should only put a note and not change normative requirements and talk about proper nouns
<una> ... it can suggest i.e. in English you may want to supress hyphenation words that are proper nouns and mixed case
<una> ... I would like to leave the heuristics up to the user agent and not bake anything into the spec
<Rossen_> ack dauwhe
<una> dave: in english should capital letters be hyphenated? maybe... I wouldn't want anythign baked into the spec that says what should happen
<astearns> s/dave/dauwhe/
<una> AmeliaBR: the rec is more to add a suggested note to add in your hyphenation dictionaries you should consider this
<una> ... at least one browser has agreed
<una> ... not sure this is a normative requirement
<una> Rossen_: so proposed resolution for this is to add a note and no normative change
<una> RESOLVED: Add A note to the spec and close with no normative change
<una> florian: myles, a while back you raised 3566 - should we reopen?

@asmusf
Copy link

asmusf commented Jun 6, 2019

...when using English rules; however, I notice that with lang=de, we can hyphenate "iTu-nes". That's probably not ideal.

That hyphenation somehow implies that Germans pronounce the word eye-too-ness instead of eye-toons or eye-tjoons. It seems "not ideal" for reasons other than capitalization; just as loan words generally aren't regular.

@frivoal frivoal added Needs Edits Testing Unnecessary Memory aid - issue doesn't require tests labels Jun 8, 2019
@frivoal frivoal self-assigned this Jul 9, 2019
@arknu
Copy link

arknu commented Apr 21, 2022

Why would you not want to hyphenate capitalized words? That decision makes absolutely no sense. It seems that, as usual, decisions are taken looking only at English and not taking into account that other languages may have different needs.

For English hyphenation may be a luxury, but for many languages with longer words (Danish, Norwegian, Swedish, Finnish, German and lots more) hyphenation is an absolute necessity for proper text layout, especially on mobile where lines are quite short. You just broke text layout for a large number of languages.

Take this example:
image

A single-word headline which, as you would expect, starts with a capital letter. Not getting hyphenated because of this stupid argument. Did no-one stop to think that not hyphenating the first word in a sentence might be a bad idea?

You absolutely CANNOT use capital letters to detect proper nouns. German capitalizes every single noun and those can be quite long. You cannot make random exceptions for different languages (like German in this case) - the spec should be language-agnostic. The web is supposed to work for all languages, yet we are once again seeing the one-sided American view that "every language must work like English".

And why would you not want to hyphenate proper nouns in the first place? They are words like any other and they need to be hyphenated when they would protrude out of their box.

This decision needs to be reverted ASAP. It makes CSS hyphenation pretty much useless, forcing us to use bloated JS libraries for what should be something that just works in the browser. Word processors have been doing automatic hyphenation for decades, it can't be that hard.

Rather than just randomly deciding not to hyphenate capitalized words, you should have added a property to control it. That way you wouldn't be harming everyone, as is currently the case. You can always use CSS to turn hyphenation off for specific words, if needed. But I cannot override this behavior right now.

@spacecakes
Copy link

spacecakes commented Apr 21, 2022

☝️ this, but worded differently. Hyphenation (especially machined-determined) is not pretty, but it's nicer than overflow or arbitrarily split up words.

Does this "fix" even make sense for English? Surely you'd prefer hyphenation over

Incompre
hensibiliti
es

on small screens with large fonts?

@asmusf
Copy link

asmusf commented Apr 21, 2022

In quite a few news paragraphs in German, the proper nouns can be the longest words. like that of politician Sabine Leutheusser-Schnarrenberger (which causes additional issues for typesetting because it's already hyphenated).

That said, I've not been able to quickly find websites or online documents that use automatic hyphenation at all. Anybody know some good examples (in various languages)

@revoltpuppy
Copy link

Unfortunately the updated suggestions for hyphenation have not yet been adopted by all browsers. Safari closed the change request without comment. This means auto-hyphenation is still fraught because some browsers will hyphenate correctly, while others incorrectly hyphenate proper names (in English).

Because the behavior is undesirable in some browsers, most sites are still making do with other workarounds.

@arknu
Copy link

arknu commented Apr 22, 2022

@revoltpuppy The one case where Safari is correct, then. This change is complete nonsense. Proper nouns should be hyphenated like any other word. They can break text layout just as easily. If there are specific words you don't want hyphenated, use a CSS class and a span to disable hyphenation for that word.

@r12a
Copy link
Contributor

r12a commented Apr 22, 2022

That said, I've not been able to quickly find websites or online documents that use automatic hyphenation at all. Anybody know some good examples (in various languages)

fwiw, W3C i18n articles use it (see https://www.w3.org/International/articlelist). The following article has a fair number of translations (i'm aware that some of the links need extra clicks - will fix this): https://www.w3.org/International/questions/qa-doc-charset It will depend on which browser you use as to which languages show hyphenation, and some adjustment of the window width is needed: on Firefox i saw hyphenation for Deutsch English Español Français हिन्दी Italiano Polski Português Português-BR Pусский Svenska Українська

@asmusf
Copy link

asmusf commented Apr 22, 2022

Thanks. In FF, the German version's hyphenation seems fine, but there seems to be an effort to prevent hyphenation of already hyphenated terms. Given that the elements in a hyphenated compound are not always short, the result are some uneven lines.

Look at the hyphenation of "Dokument-Zeicensatz" to see what I mean. It happens to work out in the title, because breaking a title into two balanced parts is better than filling one line and having just a bit left-over. But that should have been the result of an esthetic rule about type balance for headers. (In the body of the article the same rule leads to some bad line widths).

It's a well-meaning rule (to avoid two different kinds of hyphen in the same compound) but you'd never tolerate the effect in a book. So why in a browser.

@jfkthame
Copy link
Contributor Author

Look at the hyphenation of "Dokument-Zeicensatz" to see what I mean. It happens to work out in the title,

The title uses hyphens: none in its CSS. If you disable that rule, and make it sufficiently narrow, you can get a result like

Do-
ku-
ment-
Zei-
chen-
satz

but I'm not sure that browsers (or authors) should be too concerned about optimizing for such extreme cases.

It's a well-meaning rule (to avoid two different kinds of hyphen in the same compound) but you'd never tolerate the effect in a book. So why in a browser.

When you're typesetting a book, you have the luxury of making individual decisions for the specific layout (font, text size, line width, etc) that you're producing. So you can decide whether it's preferable to split

... Dokument-
Zeichensatz ...

at the explicit hyphen, preserving each component intact but perhaps leaving the line that ends "Dokument-" a bit short, or to hyphenate one of the components, e.g. resulting in

... Dokument-Zei-
chensatz ...

in order to more precisely fill the lines. And you don't have to worry that the reader will suddenly zoom the text (or resize the page) such that only a dozen characters fit on each line.

For a browser that has to dynamically lay out the text, I don't think it's easy to say, in general, which is better; it'll depend on the relative weight given to various subjective factors, and the appropriate balance is likely to be different for very narrow columns than for more "normal" page sizes.

@hftf
Copy link

hftf commented Apr 22, 2022

(For controlling line breaking at explicit hyphens, see also the open issue #3434.)

@asmusf
Copy link

asmusf commented Apr 22, 2022

Even books aren't necessarily static documents any more, if you consider something other than a novel. Non-fiction books may be updated after their first release, and at that point any custom, manual optimizations may bite you.

At the same time, books remain rather unforgiving wrt to badly justified paragraphs. And even ragged-right may look too ragged if you have any rule that rigidly prevents splitting long words.

The type of algorithm that will produce superior results is one that uses weights, and balances poor choice of hyphenation location with other factors such as uneven line length (and where applicable: unacceptably loose or tight text as result of justifying a line.

Such algorithm should be able to cope well with emergency situations.

In the example the title is a single line. If the window gets too narrow, a weight-based algorithm should be able to detect that

Dokument-
Zeichensatz

works better than

Dokument-Zeichen-
satz

while for a really narrow column may be more natural than having two lines with overflow:
Doku-
ment-
Zeichen-
satz

The way to influence such an algorithm would be by raising/lowering the priority (weight) for various line-breaking and hyphenation opportunities, but not by crudely turning some of the off or on.

The key would be to define the controls in relative mode, so that they are not dependent on any absolute weights for a given implementation.

@revoltpuppy
Copy link

This change is complete nonsense. Proper nouns should be hyphenated like any other word. They can break text layout just as easily. If there are specific words you don't want hyphenated, use a CSS class and a span to disable hyphenation for that word.

I mean, I could just say if there are specific words you want hyphenated, you should use a class and a span, too. It’s just impractical to put a span around every proper noun, no matter which way you look at it. Names as small as six letters could become hyphenated, and because of that almost nobody (writing in English) uses hyphens: auto, which is the problem in the first place. There are problems with breaking the layout and there are problems with incorrect hyphenation, and we’re trying to find the best balance for the most people.

In English, it is wrong to hyphenate names. In German, it is preferred. The spec already suggests that when German is detected that, yes, proper nouns should be hyphenated. If there are other languages where this is an issue, create a change request so that it can be discussed and those languages can be fixed. If your browser isn’t following the spec’s suggestion, put in a bug fix request with that browser.

@arknu
Copy link

arknu commented Apr 22, 2022

@revoltpuppy I strongly disagree that the CSS spec should in any way make recommendations for a specific language. It seems that whoever made this change knew English and a little German and left it at that, not bothering to research the issue more broadly. If that had been done, it would have been discovered immediately that this would not work in most languages.

It's one thing to not hyphenate words with a capital letter in a sentence. But not hyphenating the first word in a sentence is the really critical bit here. Whoever thought that was a good idea?

If English is the only language where you don't want to hyphenate proper names, then why was the change not restricted to English? Why the specific exception for German when many other languages are also affected? The process here is clearly broken. This needs to be acknowledged by the working group and the change reverted so that a proper solution that works for all languages can be worked out.

You have created massive compatibility issues worldwide for years because you didn't want a name hyphenated in English. We finally had working hyphenation in most browsers. Not perfect, but adequate. And then you decided to break it because of some random complaint. Hyphenating capitalized words is NOT a bug, it is a necessary feature!

The arrogance of just assuming that "it works like that it English, then it must be same in other languages" is really getting annoying. So many stupid design decisions in tech have been made because Americans have no concept of how other languages work. This is just the latest.

If all languages had been taken into account from the beginning, hyphenation would have been implemented 10-15 years ago because hyphenation is so important for laying out text properly in a lot languages other than English.

@arknu
Copy link

arknu commented Apr 22, 2022

I have used hyphens:auto on many real-world sites, precisely because it solves a real-world problem in Danish that used to require a heavy JS dependency. This site is an example: https://gaarden.nu/ (if will load hyphenator.js if loaded in a browser that doesn't support hyphens:auto).

And this completely random and unannounced spec change has probably created quite a few layout problems on these sites as they rely on hyphenation to have headlines break properly. In Danish, a headline will very frequently start with a long word.

@revoltpuppy
Copy link

Believe me, I get that incorrect hyphenation is frustrating. I’ve been there. I’m still there. hyphens: auto deserves a lot of attention, and it could be improved in a number of ways beyond detected language and proper nouns. There are already some useful suggestions posted above.

The good news is, there’s a way to get things fixed, it’s actually pretty simple, and changes can be made pretty quickly! And hurling insults in a closed issue is not that way.

@HeikkiYlipaavalniemi
Copy link

@revoltpuppy What are your recommendations to move this closed issue forward?

I think we all agree that hyphenation is hard and the problem is very different in different languages (e.g. English, German, Finnish). Because of this the changes should be tested and considered from multiple angles. In the recent years the hyphens: auto feature has been the most effective tool in Finnish language to make layouts stay intact. We actually have real words like lentokonesuihkuturbiinimoottoriapumekaanikkoaliupseerioppilas (for real) in Finnish which kind of easily break the layout. And the change that the sentences first word is not hyphenated breaks stuff in mobile widths and grid layouts.

We have used for example https://github.com/mnater/Hyphenopoly as a second tool to make it work for all browsers but the native CSS support is in my opinion the best way to solve this problem.

@asmusf
Copy link

asmusf commented Apr 23, 2022

I repeat my suggestion that correct way to make hyphenation decisions uses ranking/weighting/prioritizing to meet the conflicting goals of avoiding either a loose line (on the one hand) or awkward hyphenation on the other.

Any method that attempts to do this with on/off switches for selecting features will get things badly wrong - and the edge cases will show up.

So you get nonsense like not hyphenating the only word on the line, even if it exceeds total line length just because it fits some on/off criterion.

Or nonsense like a half-empty line.

The right way, clearly, is to recognize that at some point, avoiding an extremely unbalanced line (or avoiding overflow) takes precedent over avoiding things like hyphenating a name (or avoiding hyphenating an hyphenated compound).

Those restrictions are fine as long as they don't cause edge cases. So the way to deal with that is to have a way for any automated setting to override them in layout emergencies.

If we had this understanding then we would most probably not have this discussion, because even a non-optimal set of priorities (paying attention to the rules for one language over others) would not have managed to cause such extreme edge cases.

@arknu
Copy link

arknu commented Apr 23, 2022

@HeikkiYlipaavalniemi For me, the way forward is pretty clear: The Working Group needs to acknowledge that this was not thought through properly and that is was a mistake to change the spec in the way it was. We all make mistakes occasionally, and as long as we own up to our mistakes, that is OK. From that, it follows that the change should be reverted as soon as possible, encouraging browser vendors to make the change quickly. That will leave us with a hyphenation system shipping in browsers that works for most languages in most cases.

Then, careful thought should be given to how the specific issues (like not wanting certain words hyphenated) can be addressed in a language-agnostic way. This should involve a group of people from around the world with experience with various different languages so that a solution can be designed taking into account the needs of all languages. This might involve adding additional properties to control various aspects of hyphenation, as seen in page layout software like Adobe InDesign.

@frivoal
Copy link
Collaborator

frivoal commented Apr 23, 2022

@arknu

While it is perfectly ok to disagree with something and make your opinion known, I would like to encourage you to tone down the virulence of your messages. The kind of language you have been using is not appropriate.

Further more, it seems that you're reacting to the title of this issue, some early comments, or maybe to what certain browsers have been doing on their own, rather than to what has actually been added in the spec, as it doesn't state what you seem to be railing against.

whoever made this change knew English and a little German and left it at that, not bothering to research the issue more broadly. If that had been done, it would have been discovered immediately that this would not work in most languages.

The spec does not give specific rules for English and Germans. It states that this varies per language, and gives English and German as example of languages with different expectations.

The spec contains an exhortation to implementers to be mindful of differences between languages, and explicitly does not define the rules to apply in each language.

But not hyphenating the first word in a sentence is the really critical bit here. Whoever thought that was a good idea?

Nobody thought it was a good idea, and the spec does not say anything about not hyphenating the first word of a sentence, nor does it tell browser not to hyphenate all capitalized words (which would indeed include the first word of a sentence).

As a reminder, this is what was added to the spec.

Authors should correctly tag their content’s language
(e.g. using the HTML lang attribute
or XML xml:lang attribute)
in order to obtain correct automatic hyphenation.

The UA may use language-tailored heuristics
to exclude certain words
from automatic hyphenation.
For example, a UA might try to avoid hyphenation in proper nouns
by excluding words matching certain capitalization and punctuation patterns.
Such heuristics are not defined by this specification.
(Note that such heuristics will need to vary by language:
English and German, for example, have very different capitalization conventions.)

This text does not state that all words with capital letters must be prevented from hyphenating. Nor does it say that that must happen in all languages except German. Please calm down. If you find what the spec does say problematic, please be specific about which part is the source of you issue, and what you think should be stated instead.

@arknu
Copy link

arknu commented Apr 23, 2022

@frivoal Well, then it seems that every browser has misinterpreted the specification massively, since all browsers that support hyphenation have now changed the implementation to only hyphenate capitalized words in German and not in any other language. And no browser hyphenates the first word in a sentence.

I was sent here after filing a Chromium bug, so naturally I assumed that the spec was the cause of the issue. While it is clear that the spec only gives English and German as examples, this has clearly been misinterpreted by implementers. Which just goes to show that my point about sufficient knowledge of different languages being needed, both when making specs and implementing them.

But I agree that from that text in the spec, browsers have no reason to have the implementation they currently have. I will continue the conversation in the various browser issue trackers.

@HeikkiYlipaavalniemi
Copy link

HeikkiYlipaavalniemi commented Apr 23, 2022

My guess is that the misunderstanding and possible problem with the actual implementation is because of the following comment done by the css-meeting-bot from a discussion in IRC:
:

The CSS Working Group just discussed hyphens:auto should not hyphenate Capitalized words, and agreed to the following:

RESOLVED: Add A note to the spec and close with no normative change

This seems to be very different resolution compared to the text in the specification that @frivoal mentioned.

Seems like browsers have followed more the comment resolution than the actual specification.

I agree that the proper way would be to continue the discussion in browser issue queues but I think the specification could also have a note about for example hyphenating sentences first words because they are always capitalized and by default in most cases should be hyphenated.

@revoltpuppy
Copy link

@revoltpuppy What are your recommendations to move this closed issue forward?

Open new issues with the spec and with browser vendors about the bugs you are finding instead.

aarongable pushed a commit to chromium/chromium that referenced this issue Apr 25, 2022
A request was made not to hyphenate capitalized words in
English at crbug.com/963039 because it is likely that they are
proper nouns. The CSS WG discussion[1] concluded to do so for
languages except German. No objections were made and Gecko
shipped in stable. Blink followed to match at r895487
crrev.com/c/2982497.

A following discussion[2] was raised to give the control to
authors. Unfortunately the WG has not concluded yet, but a
good number of opinions not to do so for other languages than
English were raised, in the CSS WG discussion[2] and
crbug.com/1318385.

This patch changes the logic applicable only to English.

[1] w3c/csswg-drafts#3927
[2] w3c/csswg-drafts#5157

Bug: 1318385, 963039
Change-Id: Ifd04b596ee5457e51bff848e7e4b8798bc4a0ffe
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3602055
Auto-Submit: Koji Ishii <kojii@chromium.org>
Reviewed-by: Kent Tamura <tkent@chromium.org>
Commit-Queue: Kent Tamura <tkent@chromium.org>
Cr-Commit-Position: refs/heads/main@{#995561}
@r12a r12a added the i18n-eurlreq European language enablement label Apr 27, 2022
@Atnas-dev
Copy link

Due to language rules, some browsers has decided that automatic hyphenation is not allowed on capitalised words as these are seen as "Proper Nouns".

This is indeed true and respects correct gramma within the designated language.
However; it creates the bug within the browsers themselves that the first word of a sentence (if capitalised) will NEVER be hyphenated.
Also if an editor decides to write a headline in capitalised style, corresponding to the correct language rules of capitalisation. The allowed and truthful capitalised words within this heading will never be capitalised.

It is essential for supporting of creative, appealing, content rich and language appropriate hyphenation.
Stating that english will NEVER hyphenate a capitalised word is simply wrong. Stating that English will never hyphenate a proper noun is very true. But the detection of a proper noun being by the style of the text seems to be a very vague and incorrect indication.

In other words. If this issue can not be handled correctly, maybe it should not have been handled at all in the first place?

As mentioned in many other discussions about this issue, an additional hyphenation rule could be added to define whether or not capitalised words should be hyphenated in the given element.

@hanshillen
Copy link

Labels in UI controls (links, buttons, tabs, etc.) are generally capitalized, regardless of language, or whether or not they're proper nouns. Similarly, headings often use title case.

The hyphens property would have been ideal to prevent long words in headings and controls from overflowing or getting cut off when zooming to large magnification factors, ensuring a template complies with WCAG SC. 1.4.10: reflow.

The property is not useful if it only works some of the times, and the same word may or may not get hyphenated just because it starts with a capital letter. It's not feasible to expect content providers to write everything in lower case. This means we'll have to fall back on using word-break: break-all on lower viewport widths, which is much less ideal than hyphens could have been.

If there was an extra value that ignores case (e.g., hyphens: all) that a developer could opt into, it would solve everything.

mjfroman pushed a commit to mjfroman/moz-libwebrtc-third-party that referenced this issue Oct 14, 2022
This patch disables automatic hyphenation for capitalized
words. Originally raised to Firefox[1], CSS WG resolved[2].

The logic matches Firefox. There were some discussions about
more heuristic rules to detect proper nouns (e.g., iTunes) and
considerations for other languages than German. We can tweak
the rules as they come up.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1550532
[2] w3c/csswg-drafts#3927

Bug: 963039, 973102
Change-Id: I437a98a3c6eacdf4b027c622e5f60bdd056a57b8
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2982497
Reviewed-by: Yoshifumi Inoue <yosin@chromium.org>
Commit-Queue: Koji Ishii <kojii@chromium.org>
Cr-Commit-Position: refs/heads/master@{#895487}
NOKEYCHECK=True
GitOrigin-RevId: 152b45f49f0a3f53645c3b56036dcf188187cb55
mjfroman pushed a commit to mjfroman/moz-libwebrtc-third-party that referenced this issue Oct 14, 2022
A request was made not to hyphenate capitalized words in
English at crbug.com/963039 because it is likely that they are
proper nouns. The CSS WG discussion[1] concluded to do so for
languages except German. No objections were made and Gecko
shipped in stable. Blink followed to match at r895487
crrev.com/c/2982497.

A following discussion[2] was raised to give the control to
authors. Unfortunately the WG has not concluded yet, but a
good number of opinions not to do so for other languages than
English were raised, in the CSS WG discussion[2] and
crbug.com/1318385.

This patch changes the logic applicable only to English.

[1] w3c/csswg-drafts#3927
[2] w3c/csswg-drafts#5157

Bug: 1318385, 963039
Change-Id: Ifd04b596ee5457e51bff848e7e4b8798bc4a0ffe
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3602055
Auto-Submit: Koji Ishii <kojii@chromium.org>
Reviewed-by: Kent Tamura <tkent@chromium.org>
Commit-Queue: Kent Tamura <tkent@chromium.org>
Cr-Commit-Position: refs/heads/main@{#995561}
NOKEYCHECK=True
GitOrigin-RevId: 92a0834acb49360fe1e2bd212484ca4fef9fc2ab
@rdhelms
Copy link

rdhelms commented Sep 26, 2024

I just went down this rabbit hole of

  1. being confused why a long non-linguistic string broken at the end of a line only gets a hyphen when lowercased
  2. realizing that a workaround is to use lang="de"...?

I agree with @hanshillen that hyphens: all is exactly what I would have wanted and expected. Is something like that being discussed officially anywhere? My desired behavior with that would simply be for hyphens to be inserted anywhere that a string is broken at the end of a line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closed Accepted by CSSWG Resolution Commenter Satisfied Commenter has indicated satisfaction with the resolution / edits. css-text-3 Current Work css-text-4 i18n-eurlreq European language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. Testing Unnecessary Memory aid - issue doesn't require tests Tracked in DoC
Projects
None yet
Development

No branches or pull requests