[Gambas-user] gb3 RC1: using structures to replace the loss of Mk$ functions

Fri Apr 8 08:32:20 CEST 2011

Kevin Fishburne ha scritto:
> On 04/07/2011 12:20 PM, Benoît Minisini wrote:
>    
>> When you write the data to the socket, all data are converted from the CPU
>> endianness (little endian for Intel/AMD) to the network endianness (big endian
>> by definition).
>>
>> When you read the data from the socket, the data must be converted back from
>> big endian to little endian, and it is done automatically by READ.
>>
>> But if you read everything inside a string, that is not done, because string
>> are bytes, and so do not have endianness.
>>
>> Do you have an example of code that takes the socket data from a big message
>> like you described and decode it?
>>
>> We will find a solution!
>>      
> Ahh, I understand a little better. I don't know much about endianness,
> so please excuse my ignorance here. When reading socket data into a
> string how is the data affected exactly? I can think of a few possibilities:
>
> 1) The byte order of the entire string is reversed ("abcd" becomes "dcba")
> 2) The bit order of the entire string is reversed (00001111 becomes
> 11110000)
> 3) The byte order of the values is reversed (0000 1111, 1010 1100
> becomes 1111 0000, 1100 1010)
> 4) The bit order of the values is reversed(0000 1111, 1010 1100 becomes
> 1111 0000, 0011 0101)
>    
Endiannes refers to the order the bytes are kept in memory, when a 
numerical multi-byte value is involved. Strictly speaking, strings are 
not affected. For example, human beings are big endian: the number 10, 
composed by two digits, is written with the weighter digit first. If 
human beings were little endian, they wrote this number as "01".

Computers don't use decimal base (not always true), but it is the same. 
If you have a word composed by two bytes, you can write those bytes in 
two different orders: MSB (most significant byte, or "heavier byte") 
first, or LSB first. The number 512 in hex is formed by two bytes: MSB=2 
and LSB=0, written by a human being as "&h0200". A computer can store 
(or send over a channel) those bytes in either order. The same mechanism 
applies to 4-bytes numbers, and floating point numbers. These last ones 
could be a little different, because they have two parts, mantissa and 
exponent, but normally mantissa is considered least significant than 
exponent. Note that even strings can be affected, if they are UTF (or 
multi-byte, or whatever), because a single character can need two bytes.

So, Benoit is right when he says that this endiannes has to do with 
network. But this is not completely true, because the same problem 
arises when a computer writes some data to a file, and this file is 
transferred to another computer. UTF is a clear example: files which 
contain UTF text can sometimes be problematic because it is not clear 
which endianness this UTF has. Correctly composed files have a marker 
which clearly states the endianness (named BOM, perhaps). This is why 
some text editors try different encodings when opening a file, and 
assume they chose the wrong encoding if an illegal character is detected.

While it is good that sending data over a network should be an 
endiannes-aware operation, it is also true that some easy way to play 
with single bytes could be handy. A swapendiannes(), or 
changeendianness() function could be used to swap bytes inside a 
variable, be it two, four or eight bytes long. Just an idea.

Regards,
Doriano