\Encoding

Encoding

Some UTF-8 encoding/decoding

Examples

http://www.framework2.com.ar/dzone/forceUTF8-es/


                                                                
                    

Summary

Methods
Properties
Constants
toUTF8()
toWin1252()
toISO8859()
toLatin1()
fixUTF8()
UTF8FixWin1252Chars()
removeBOM()
normalizeEncoding()
encode()
No public properties found
No constants found
No protected methods found
No protected properties found
N/A
No private methods found
No private properties found
N/A

Methods

toUTF8()

toUTF8(  text) : 

Function Encoding::toUTF8

This function leaves UTF8 characters alone, while converting almost all non-UTF8 to UTF8.

It assumes that the encoding of the original string is either Windows-1252 or ISO 8859-1.

It may fail to convert characters to UTF-8 if they fall into one of these scenarios:

1) when any of these characters: ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß are followed by any of these: ("group B") ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶•¸¹º»¼½¾¿ For example: %ABREPRESENT%C9%BB. «REPRESENTÉ» The "«" (%AB) character will be converted, but the "É" followed by "»" (%C9%BB) is also a valid unicode character, and will be left unchanged.

2) when any of these: àáâãäåæçèéêëìíîï are followed by TWO chars from group B, 3) when any of these: ðñòó are followed by THREE chars from group B.

Parameters

text

Any string.

Returns

The same string, UTF8 encoded

toWin1252()

toWin1252(  text) : 

toWin1252

Parameters

text

Any string.

Returns

The same string, Win1252 encoded

toISO8859()

toISO8859(  text) : 

toISO8859

Parameters

text

Any string.

Returns

The same string, ISO8859 encoded

toLatin1()

toLatin1(  text) : 

toLatin1

Parameters

text

Any string.

Returns

The same string, Latin1 encoded

fixUTF8()

fixUTF8(  text) : 

fixUTF8

Parameters

text

Any string.

Returns

The same string, UTF8 correctly encoded

UTF8FixWin1252Chars()

UTF8FixWin1252Chars(  text) : 

UTF8FixWin1252Chars

If you received an UTF-8 string that was converted from Windows-1252 as it was ISO8859-1 (ignoring Windows-1252 chars from 80 to 9F) use this function to fix it. See: http://en.wikipedia.org/wiki/Windows-1252

Parameters

text

Any string.

Returns

The same string, UTF8 encoded

removeBOM()

removeBOM(  str) : 

removeBOM

Parameters

str

Returns

normalizeEncoding()

normalizeEncoding(  encodingLabel) : mixed||string|int

normalizeEncoding

Parameters

encodingLabel

Type of encoding: 'ISO88591' 'ISO8859' 'ISO' 'LATIN1' 'LATIN' 'UTF8' 'UTF' 'WIN1252' 'WINDOWS1252'

Returns

mixed||string|int —

of equivalent encodings

encode()

encode(  encodingLabel,   text) : 

encode()

Parameters

encodingLabel

Type of encoding ('UTF-8' or 'ISO-8859-1')

text

Any string.

Returns

The same string, encoded accordingly