CLDR 48 Release Note

No.	Date	Rel. Note	Data	Charts	Spec	Delta	GitHub Tag	Delta DTD	CLDR JSON
48	2025-10-XX	v48	~~CLDR48~~	Charts48	LDML48	Δ48	release-48-beta1	ΔDtd48	48.0.0-BETA1

BETA DRAFT

Overview

Unicode CLDR provides key building blocks for software supporting the world’s languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

CLDR 48 was an open submission cycle allowing contributors to supply data for their languages via the CLDR Survey Tool — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

Changes

The most significant changes in this release are:

Updated for Unicode 17, including new names and search terms for new emoji, new sort-order, Han → Latin romanization additions for many characters.
Updated to the latest external standards and data sources, such as the language subtag registry, UN M49 macro regions, ISO 4217 currencies, etc.
Many enhancements of the CLDR specification (LDML), including:
- TBD A summary of the changes will be added for the spec-beta. In the meantime, see the Modifications section.
Many additions to language data including:
- Likely Subtags, for deriving the likely script and region from the language (used in many processes).
- Language populations in countries: significant updates to improve accuracy and maintainability.
New formatting options
- Rational number formats added, allowing for formats like 5½.
- For timezones, usesMetazone adds two new attributes stdOffset and dstOffset so that implementations can use either “vanguard” or “rearguard” TZDB data sources.
- There are now combination formats for relative dates + times, such as “tomorrow at 12:30”.
- Additional units were added for scientific contexts (coulombs, farads, teslas, etc.) and for English systems (fortnights, imperial pints, etc.). New non-metric units were not translated aside from a few languages.
Many corrections and updates for Metazone data, for calendars (including removal of eras and fixes to start dates).
This is the first release where the new CLDR Organization process is in place for DDL languages. As a result, several locales were able to reach higher levels (see below).

For more details, see below.

Locale Coverage Status

The following shows the coverage levels per language in this version of CLDR.

The With Script column indicates which of the Count locales are language-script variants.
- For example, zh_Hant and zh(_Hans) add two to the Count, and one to With Script.
The Regional Variants column indicates the number of other regional locales: none are in Count.
- For example, there are 46 locales for French, such as fr, fr_CA, fr_BE, etc., so that adds 46 to the RV column for Modern.

Current Levels

Count	With Script	Regional Variants	Level	Usage	Examples
104	5	305	Modern	Suitable for full UI internationalization	Afrikaans, shqip, አማርኛ, ‫العربية‬, հայերեն, অসমীয়া, azərbaycan
13	0	1	Moderate	Suitable for “document content” internationalization, eg. in spreadsheet	Akan, Cebuano, Māori, тоҷикӣ
57	10	22	Basic	Suitable for locale selection, eg. choice of language on mobile phone	भोजपुरी, बर’, डोगरी, eʋegbe, Gã, हरियाणवी

Changes

±	New Level	Locales
📈	Modern	Akan, Bashkir, Chuvash, Kazakh (Arabic), Romansh, Shan, Quechua
📈	Moderate	Anii, Esperanto
📈	Basic	Buriat, Piedmontese, Sicilian, Tuvinian
📉	Basic*	Baluchi (Latin), Kurdish

* Note: Two locales dropped in coverage (📉), from Moderate to Basic. Each release, the number of items needed for Modern and Moderate increases. So locales without active contributors may drop down in coverage level.

For a full listing, see Coverage Levels

Specification Changes

The following are the most significant changes to the specification (LDML).

Locale Identifiers and Names

Display Name Elements Described the usage of the language element menu values core and extension, and alt="menu". Also revamped the description of how to construct names for locale IDs, for clarity.

Misc.

Character Elements Added new exemplar types.
Person Name Validation Added guidance for validating person names.

DateTime formats

Element dateTimeFormat Added a new type relative for relative date/times, such as “tomorrow at 10:00”, and updated the guidelines for using the different dateTimeFormat types.
timeZoneNames Elements Used for Fallback Added the gmtUnknownFormat, to indicate when the timezone is unknown.
Metazone Names Added usesMetazone, to specify which offset is considered standard time, and which offset is considered daylight.
Time Zone Format Terminology Added the Localized GMT format (and removing the Specific location format). This affects the behavior of the z timezone format symbol. There is also now a mechanism for finding the region code from short timezone identifier, which is used for the non-location formats (generic or specific)
Calendar Data Specified more precisely the meaning of the era attributes in supplemental data, and how to determine the transition point in time between eras.

Numbers

Plural rules syntax Added substantial clarifications and new examples. The order of execution is also clearly specified.
Compact Number Formats Specified the mechanism for formatting compact numbers more precisely.
Rational Numbers Added support for formatting fractions like 5½.

Units of Measurement

Unit Syntax Simplified the EBNF product_unit and added an additional well-formedness constraint for mixed units.
Unit Identifier Normalization Modified the normalization process
Mixed Units Modified the guidance for handling precision.

MessageFormat

Syntax and data model errors now must be prioritized over other errors (#1011)
The Default Bidi Strategy is now required and default (#1066)
The :offset function (previously named :math) is now available as Stable (#1073)
The :datetime, :date, and :time functions are updated to build on top of semantic skeletons (#1078, #1083)
:percent is added as a new Draft function (#1094)

There are many more changes that are important to implementations, such as changes to certain identifier syntax and various algorithms. See the Modifications section of the specification for details.

Data Changes

Locale Changes

General

Languages that reached Basic in the last release have their names translated at Modern Coverage in this release.
Compound language names now have “core” and “extension” variants for more uniform formats in menus and lists. The description of how to format names for locale IDs has been extended and clarified.
- For example, that allows the Kurdish variants to have a uniform format where more than Kurmanji is displayed.
  - Kashmiri
  - Kurdish (Kurmanji, Latin)
  - Kurdish (Central, Arabic)
  - Kurdish (Southern, Arabic)
  - Kyrgyz
Many features selectable with locale options now have scope="core" names, for better presentation in menus.
- Calendar names, collation names, emoji options, currency formats, hour-cycle options, and so on.
- Rather than seeing
  - Calendar
    - Buddhist Calendar
    - Chinese Calendar
    - Gregorian Calendar
- Users can see
  - Calendar
    - Buddhist
    - Chinese
    - Gregorian
Recent or upcoming currency names were added (XCG, ZWG).
To match ISO, added translations for the region Sark (CQ).
There are now combination formats for relative dates + times, such as “tomorrow at 12:30”. In some languages the use of a relative date such as “tomorrow” or “2 days ago” requires a different combining pattern than for a fixed date like “March 20”. A new “relative” variant is introduced to allow for those languages.
Some additional flexible date formats were added. (aka availableFormats)
Many locales had seldom-used short timezone abbreviations (such as EST) removed, or moved to sublocales that use them.
The currency-number formats for alphaNextToNumber, noCurrency, and compact currency formats are now generated from other data for consistency. The alphaNextToNumber patterns allow for a space between letter currency symbols and numbers. For example, “USD 123” vs “$123”.
The tooling made it easier to see when a space was a non-breaking character or not, or thin versions of those. The usage is now more consistent in many locales.
New emoji for Unicode 17 have added names and search keywords.
For the Etc/Unknown timezone, the exemplarCity name was changed from “Unknown City” to “Unknown Location” for clarity.
Rational number formats were added, allowing for formats like 5½.
Certain concentration units were reworked, for “parts per million”, “parts per billion”.
Additional units were added for scientific contexts (coulombs, farads, teslas, etc.) and for English systems (fortnights, imperial pints, etc.). However, translation of these English system names was not required.
Additional guidance on translation was added, leading to refined translations or transcreations.
SIL contributed exemplar data for 860 new or updated locales. The ones that don’t have other locale data are in the /exemplars/ directory (parallel to /common.

Specific Locales

Kurdish (Kurmanji) ku split from 1 locale ku_TR into 5 locales across 2 scripts and 4 countries. (CLDR-18311)
- ku_Latn_TR: Kurdish (Kurmanji, Latin alphabet, Turkey) default for Kurdish (Kurmanji) ku and ku_Latn
- ku_Latn_SY: Kurdish (Kurmanji, Latin alphabet, Syria)
- ku_Latn_IQ: Kurdish (Kurmanji, Latin alphabet, Iraq)
- ku_Arab_IQ: Kurdish (Kurmanji, Arabic writing, Iraq), default for Kurdish (Kurmanji, Arabic writing) ku_Arab
- ku_Arab_IR: Kurdish (Kurmanji, Arabic writing, Iran)

For a full listing, see Delta Data.

DTD Changes

For a full listing, see Delta DTDs.

ldml

The explanations of usage are in the Locale Changes section.

exemplarCharacters — added more type values:
- numbers-auxiliary — for number characters that are not ‘core’ to the language, but sometimes used (like regular auxiliary).
- punctuation-auxiliary — for punctual characters that are not ‘core’ to the language, but sometimes used (like regular auxiliary).
- punctuation-person — for the limited set of punctuation characters used in person name fields: eg, “Jean-Luc”, “MD, Ph.D.”.
dateTimeFormat — added a relative type value for combining time and date.
gmtUnknownFormat — element was added — Indicating that the timezone is unknown (as opposed to absent from the format).
language — added more menu values: core and extension.
type — added a core scope value.
numbers — added rationalFormats sub-elements: rationalPattern, integerAndRationalPattern (with an alt="superSub" variant), rationalUsage.
rbnf/rulesetGrouping — added rbnfRules sub-element:
- This “flattens” the rules into a format that is easier for implementations to use directly.

supplementalData

era — the range of code values now allows two letters before the first hyphen.
languageData — the territories attribute supplementalData.xml was deprecated and data using it removed. The definition was unclear, and prone to misunderstanding — the more detailed data is in territoryInfo. (CLDR-5708)
usesMetazone — adds two new attributes stdOffset and dstOffset so that implementations can use either “vanguard” or “rearguard” TZDB data sources.
numberingSystem — Unicode 17 data was added.

ldmlBCP47

type — adds a new attribute region, for detemining the region from short timezone IDs when not derivable from the first two characters.
keyboard3@conformsTo — is updated to allow “48”.
hc — adds values c12 and c24 as Technical Preview. Also see the note about h24 in “V49 advance warnings”. (CLDR-18894)

BCP47 Data Changes

For a full listing, see BCP47 Delta.

nu-tols numbering system for Tolong Siki digits
One additional zone: America/Coyhaique = tz-clcxq
Seven region attributes for determining regions for timezones
Three additional aliases

Supplemental Data Changes

For a full listing, see Supplemental Delta.

Identifiers

Added aliases/deprecations for languages (dek, mnk, nte).
Updated to the latest language subtag registry, with various additions and deprecations.
Updated to the ISO currency data, with various additions and deprecations.
Added unit IDs part, part-per-1e6, part-per-1e9, cup-imperial, fluid-ounce-metric, … with conversions.
- deprecated unit IDs permillion, portion, portion-per-1e9, 100-kilometer.

Language Data

language_script.tsv updated to include only one “Primary” writing system for languages that used to have multiple options (CLDR-18114). Notable changes are:
- Punjabi pa has changed the primary script to Gurmukhi Guru because widespread usage is in the Gurmukhi script. While most speakers are in Pakistan PK, written usage remains Gurmukhi.
- Azerbaijani az and Northern Kurdish ku primarily are used in Latin Latn.
- Chinese languages zh, hak, and nan are matched to Simplified Han writing Hans – except Cantonese yue, which is known for a preference in Traditional Han writing Hant.
- Hassiniyya mey was missing significant data, it should be associated with the Arabic Arab writing system by default, not Latin Latn.
5 new language distance values are added (for fallback to zh).
Substantial updates to Language Info: additional languages in countries; revised population values, writing percentages, literacy percentages, and official status values.

Likely Subtags

Many additions: see Likely Subtags
Errors in likely subtags addressed
- The default language for Belarus BY is now Russian ru, reflecting modern usage. (CLDR-14479)
- Literary Chinese lzh was written in Traditional Han writing Hant. (CLDR-16715)
Likely subtags updated because of prior mentioned primary script matches.
- Northern Kurdish ku now matched to Cyrillic writing in the CIS countries. (CLDR-18114)
- Hassiniyya mey updated to default to mey_Arab_DZ instead of mey_Latn_SN. (CLDR-18114)

Calendars, Timezones, Dayperiods

Many updates and corrections for Metazone data
Many updates to calendars, including the removal of eras and adjustment to era start dates
Day periods for kok, scn, hi_Latn

Plural Rules

Additions for cv, ie, kok, sgs

Currencies

Updates to the latest ISO currencies

Weekdata

IS changed to firstDay=sun
ku_SY adding H and hB

Transforms

For a full listing, see Transforms Delta.

Fixed problem in Gujarati → Latin romanization, with ૰
Updated to latest Unicode 17 data for Han → Latin, with very many changes.

Number Spellout Data Changes

The biggest change is to the format, which has been “flattened” for easier use by clients.

JSON Data Changes

RBNF
- Just as with the RBNF data format change in XML CLDR-8909, the JSON data also has a change in structure. CLDR-18956.
- Below is an example of the changed data format.
- The new data item is the _rbnfRulesFile key. Its value is the name of a data file in the same directory, containing the raw rules. (Note: Do not interpret the .txt file’s name in any way.)
- The previous data format is included for this release, but will be removed in a future release. In this case, the %digits-ordinal (and any other such keys) will be removed.

{
    "rbnf": {
      "OrdinalRules": {
        "%digits-ordinal": [
          [
            "-x",
            "−→→;"
          ],
          [
            "0",
            "=#,##0=;"
          ]
        ],
        "_rbnfRulesFile": "ar-OrdinalRules.txt"
      },
    }
}

The ar-OrdinalRules.txt file contains all rules for this locale:

%digits-ordinal: -x: −»; 0: =#,##0=;

File Changes

The following files are new in the release:

Level 1	Level 2	Level 3	Files
common	annotations		ba.xml, shn.xml, sv_FI.xml, syr.xml
	casing		sgs.xml
	collation		blo.xml, sgs.xml
	main		bqi_IR.xml, bqi.xml, bua_RU.xml, bua.xml, en_EE.xml, en_GE.xml, en_JP.xml, en_LT.xml, en_LV.xml, en_UA.xml, kek_GT.xml, kek.xml, ku_Arab_IQ.xml, ku_Arab_IR.xml, ku_Arab.xml, ku_Latn_IQ.xml, ku_Latn_SY.xml, ku_Latn_TR.xml, ku_Latn.xml, lzz_TR.xml, lzz.xml, mww_Hmnp_US.xml, mww_Hmnp.xml, mww.xml, oka_CA.xml, oka_US.xml, oka.xml, pi_Latn_GB.xml, pi_Latn.xml, pi.xml, pms_IT.xml, pms.xml, sgs_LT.xml, sgs.xml, suz_Deva_NP.xml, suz_Deva.xml, suz_Sunu_NP.xml, suz_Sunu.xml, suz.xml
	testData	personNameTest	ba.txt, blo.txt, cv.txt, kk_Arab.txt, kok_Latn.txt, rm.txt, shn.txt
	uca		FractionalUCA_blanked.txt

Migration

Number patterns that did not have a specific numberSystem (such as latn or arab) had been deprecated for many releases, and were finally removed.
Additionally, language and territory data in languageData and territoryInfo data received significant updates to improve accuracy and maintainability CLDR-18087
The likely language for Belarus changed to Russian CLDR-14479
Using Time Zone Names Removed the “specific location format”, and modified the fallback behavior of ‘z’.
Unit Identifier Normalization Modified the normalization process
The era element type attributes no longer need to start at 0. Implementations that use arrays may have to adjust their implementations.
The default week numbering mechanism changes to be identical to ISO instead being based on the calendar week.
Deprecated unit IDs permillion, portion, portion-per-1e9. These are replaced by IDs using part. Also deprecated the compound component 100-kilometer, since (certain) integers are allowed in unit ID denominators.
For compact short currency formatting (such as “$13B”), implementations should use the new alphaNextToNumber variants to get the correct spacing.
The unit identifiers for the following changed for consistency. As with all such changes, aliases are available to permit parsing and formatting to work across versions.
- permillion changed to part-per-1e6; English values remain “parts per million”, “{0} part per million”, etc.
- portion-per-1e9 changed to part-per-1e9; English values remain “parts per billion”, “{0} part per billion”, etc.
- part used for constructing arbitrary concentrations such as “parts per 100,000”; English values “parts”, “{0} part”, etc.
English and/or root names of many exemplar cities and some metazones changed. This was typically to move towards the official spelling in the country in question, such as retaining accents, or to add landscape terms such as “Island”. For example: El Aaiun → El Aaiún; Casey → Casey Station; Hovd Time → Khovd Time.
A few additional availableFormat and interval format patterns have been added, such as GyMEd and Hv, to fill some gaps.
The metazone for Hawaii has changed.

V49 advance warnings

The following changes are planned for CLDR 49. Please plan accordingly to avoid disruption.

CLDR-18303 H24 will be deprecated. If it is encountered, it will have H23 behavior. There is no known intentional usage of H24. If you have a current need for H24 instead of H23, please comment on CLDR-18303.
The default week numbering changes to ISO instead being based on the calendar week starting in CLDR 48 CLDR-18275. The calendar week will be more clearly targeted at matching usage in displayed month calendars.
The pre-Meiji Japanese eras will be removed: There was too much uncertainty in the exact values and feedback that the general practice for exact dates is to use Gregorian for pre-Meiji dates.
The major components in supplementalData.xml and supplementalMetadata.xml files are slated to be organized more logically and moved into separate files.
- This will make it easier for implementations to filter out data that they don’t need, and make internal maintenance easier. This will not affect the data: just which file it is located in. Please plan to update XML and JSON parsers accordingly.

Known Issues

CLDR-18219 common/subdivisions data files contained additional values that should not be present. These will be removed in the future, but note that they may be present in the new JSON data:
- Non-subdivisions such as AW: Use the region code AW instead for translation.
- Overlong subdivisions such as fi01: Use the region code AX instead for translation.

Acknowledgments

Many people have made significant contributions to CLDR and LDML. For a full listing, see the Acknowledgments.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.