This is a thread containing some information that will hopefully be helpful to beginners aspiring to learn C++. It contains some computer science knowledge that is a prerequisite to understanding C++ effectively.
#100DaysOfCode [1/75]
This is not a political thread. I mostly tweet politics, so if you're thinking about following me for more software engineering tweets, please keep that in mind. [2/75]
I think that the best way of learning a low-level language like C++ is to know how the computer runs it. Of course, there's a lot more involved than is covered in this thread (such as compilers and machine code), but this should help you get on your feet. [3/75]
This is not at all something you can use by itself to get programming in C++; instead, it is something that can complement other sources that are more focused on the code. This thread doesn't have much code in it. So let's begin. [4/75]
:: Bits and the Base-2 (Binary) Counting System [5/75]
When handled by computers, data is normally represented in 'bits' (a 'bit' is a value which can either be 1 or 0). How the data is represented like that depends on the type of data. [6/75]
For example, numbers can be represented using a base-2 counting system. This means that there are two possible digits (i.e. 0 and 1) with which a number can be represented. This is different from the way humans normally count: with a base-10 system (10 digits: 0-9). [7/75]
While counting in base-10, we move to the next digit (1 to 2 to 3...) until we reach the last digit (9), after which we increase the next place value and revert the current place value to 0. [8/75]
For example, the number after 8 is 9 (moving to the next digit). To get the number after that, since 9 is the last digit, we revert this place value (units) to 0 and increase the next (tens) to 1, getting 10. [9/75]
Similarly, counting in base-2, if we start at 0, we can move forward to the next digit: 1. For the next number, since 1 is the last digit in the base-2 system, we increase the next place value to get 10. Likewise: 11, 100, 101, 110, 111, 1000... [10/75]
Bits are usually grouped into eight and called 'octets'. This will be further elaborated on later, but for now, 'octet' means 'eight bits'. [11/75]
Normally, while using base-2, we work with 'octets', which means 'eight bits'. If our number can be represented in less bits, like 10110, we just add zeroes to the left to make the length eight: 00010110. Like in base-10, adding zeroes doesn't change the number. [12/75]
With one octet, there are 256 possible combinations (00000000, 00000001, ..., 11111110, 11111111), which is quite low. To accomodate for this, we sometimes use multiple octets (how many depends on the data type). [13/75]
When storing integers, we normally want to be able to store negative integers as well. This is called being 'signed' (having a positive or a negative sign). This is the default representation; the opposite is 'unsigned' and needs to be explicitly stated. [14/75]
Sometimes, we want to be able to store decimal numbers. [15/75]
Storing negative numbers is accomplished using a technique known as two's complement representation, and decimal numbers are usually stored in IEE-754 format. As a beginner, you mostly do not need to worry about these. [16/75]
One important thing to keep in mind is that like some fractions can't be accurately represented as a base-10 decimal (such as 1/3 being 0.333...), some can't as a base-2 decimal. [17/75]
:: Base-16 (Hexadecimal) Counting System [18/75]
Like the base-2 (binary) counting system and the base-10 counting system humans use, another common system exists known as the base-16 (hexadecimal) counting system. This has 16 digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E and F. [19/75]
Counting works in the same way as in other systems: increase the current place value; if it is the last digit, reset it to the first digit and increase the next place value. Counting from '8': 8, 9, A, B, ..., E, F, 10, 11, ..., 18, 19, 1A, 1B, ..., 1E, 1F, 20. [20/75]
Hexadecimal representations are used because they are less cubersome than binary (where there are only two digits to use), and the conversion between hexadecimal and binary representations is very simple, only involving a table. [21/75]
For instance, to convert the hexadecimal value 7C to a binary representation: 7 on the table is 0111 and C is 1100; therefore, 7C is 01111100. Similarly, 11000010 can be converted to hexadecimal: 1100 is C and 0010 is 2; therefore, 11000010 is C2. [22/75]
:: Memory [23/75]
Computers store data they are working with in a place called 'RAM', which stands for Random Access Memory. This can be thought of as a contiguous series of 'bytes'. For now, 'bytes' can be considered synonymous with octets. [24/75]
A RAM with a capacity of 4 bytes might look like this: 00111001 10001111 00000000 10011110. Each byte can be 'addressed' with a number counting up starting from 0. The first byte will be address 0, the second will be address 1 and so on. [25/75]
The byte with the value '00000000' in the example above could be referred to as 'address 2'. The processor can be given a memory address and told to write to it or read from it. For example, it can be told to write '11111111' to address 2. [26/75]
The memory will now look like this: 00111001 10001111 00000000 10011110. Note that the processor can't be told to write a single bit. For example, to set address 1 to 11001111, it is impossible to tell the processor to set its second bit to '1'. [27/75]
Instead, the whole bit must be set to 11001111. This leads to the actual definition of a byte: the smallest addressable unit of memory. However, in virtually all cases, this is equal to an octet. [28/75]
This example had 4 bytes of memory. A decent computer nowadays has 8 gigabytes of memory. 1 gigabyte is equal to 1073741824 bytes. [29/75]
Addresses are usually written as 16 digit long base-16 hexadecimal numbers with zeroes added to the left as necessary. For example, the sixteenth address (address 15) would be written as address 000000000000000F (which is hexadecimal for 15). [30/75]
Each program (process, to be precise) that executes on a computer is assigned a 'virtual address space' by the operating system. This is essentially a virtual memory given to a process. [31/75]
The address 000000000000000F could contain different values depending on which process it is being read from; the operating system maps them to different addresses in the actual RAM (one process might have it at 00000000C0150001 and the other at 00000000070A00FF). [32/75]
Before using an address, however, a program has to ask the operating system to 'allocate' memory for it. This means assigning a portion of RAM to the process's address space. The operating system also loads the program itself into RAM. [33/75]
The program may then use the address space allocated to it to store data it is working with. [34/75]
Memory (including a virtual address space) is divided into two sections: a heap, and a stack. [35/75]
Memory on the heap is allocated upon request by the operating system. Allocating memory on the heap therefore results in a performance hit (heap allocations are 'expensive'). [36/75]
Memory on the stack doesn't require allocation like that and is therefore faster (the stack is 'cheap'). However, the stack is used for storing very temporary data. More permanent data is stored in the heap. [37/75]
:: Common Data Types [38/75]
An 'int' data type (C++) usually has a size of 4 octets (32 bits). This means that the value '0' stored in an int will look somewhat like this: 00000000 00000000 00000000 00000000, stored across four octets. [39/75]
Other types for storing numbers include char (1 octet), short (2 octets) and long (8 octets). Which one is used depends on the circumstance, but it is usually a good idea to go for the smallest possible type in order to not waste memory. [40/75]
Characters are stored using an 'encoding' which maps certain values to certain letters. For example, with ASCII, the character 'x' is represented as 01111000. [41/75]
A type can also be made from other types. For example, we may want a type containing the location of a pixel. This location can be represented using two unsigned integers: one for the X value, and one for the Y. [42/75]
These can be placed adjacent to each other to make a compound 'point' type. For example, the value 1920 is represented in an unsigned integer as 00000000 00000000 00000111 10000000, and the value 1080 is represented as 00000000 00000000 00000100 00111000. [43/75]
Therefore, the point 1920x1080 can be represented as 00000000 00000000 00000111 10000000 00000000 00000000 00000100 00111000. This way of grouping data together is known as a 'struct' or 'class' in C++. [44/75]
All these data types have a known size—the number of octets the data type occupies. An struct containing 3 integers, for example, will have a length of (3 integers * 4 octets per integer) 12 octets. [45/75]
Some data types, however, do not have a known size. They will be discussed later on. [46/75]
:: Variables [47/75]
Imagine that in a programming language such as C++, we want to store the value '5' in memory. It will be extremely cumbersome to have to decide on an address for it and refer to it therewith (like 000000000000000F = 5). [48/75]
Instead, we can create a 'variable', which the computer will automatically understand to mean a certain memory address. We can name the variable something we like, such as myFive = 5. 'myFive' will then automatically refer to an address which will be set to '5'. [49/75]
In essence, a variable can be thought of as a named memory location. The value of the variable will be stored in the stack, which makes it very fast. The address of the stack may vary, but the computer handles that variation. [50/75]
For example, if the variable myFive is assigned the memory location 000000000000000A, and was declared as an 'int' (a size of 4 octets), it will occupy memory addresses 000000000000000A through 000000000000000D. [51/75]
The value of these addresses will be set to 00000000 00000000 00000000 00000101 (base-2/binary for 5). If the variable is then set to 7, the data at the same addresses (000000000000000A-000000000000000D) will be set to 00000000 00000000 00000000 00000111. [52/75]
If it is then read, then four octets starting from the memory addresses 000000000000000A will be read. The computer knows to read four octets because the data type 'int' has a size of four octets. All variables must have a fixed size. [53/75]
:: Pointers [54/75]
Sometimes, the value of a variable is actually a memory address that points to some data! This sort of variable is known as a pointer. [55/75]
For example, let's say there are two variables: myFive and myTen, with type int containing values 5 and 10 at addresses 0000000000000006 and 000000000000000A respectively. Suppose we want to have a variable 'pointing' to one of these variables. [56/75]
We can create a variable myPtr with the value 0000000000000006. This means that it is now pointing to a memory region that contains the value '5'. Note that the value of the pointer variable itself is still 0000000000000006. [57/75]
To access the value pointed to (5), we can 'dereference' the pointer (which means finding the value at a given address). In C++, this would be written as '*myPtr', with an asterisk. The value of '*myPtr' will be 5. [58/75]
We can now change the value of myPtr itself to 000000000000000A. Once we've done that, the data at addresses 0000000000000006-0000000000000009 (myFive) will still remain '5'; only the address pointed to by myPtr will change. [59/75]
Now we dereference myPtr as '*myPtr'. This will evaluate to 10, because the data at addresses 000000000000000A-000000000000000D contains 10. [60/75]
The computer knows to read four bytes, because when we make a pointer like this, the pointer's type also contains the type of the data being pointed to, int, which has a size of 4 octets. [61/75]
:: Data Types Involving Pointers [62/75]
Text is stored as a sequence of characters (known as a 'string'). The string "xx" for instance comprises two 'x' characters. An 'x' character is 01111000, so a string "xx" will contain 01111000 01111000, which will occupty two octets. [63/75]
However, since the length of the string is not known, there has to be a way of determining when a string ends. A common way to do this is to have a null octet (an octet with all bits set to zero; 00000000) signifying the end of the string. [64/75]
"xx" will therefore be: 01111000 01111000 00000000. This is called a 'null-terminated string'; the string ends where a null octet is found. Another way is to store the length of the string in the first octets of the string. [65/75]
The string "xx" is two characters long. Two is 00000000 00000000 00000000 00000010 when represented as a 32-bit integer, so the string "xx" can be represented as 00000000 00000000 00000000 00000010 01111000 01111000. [66/75]
The first four octets are looked at in order to determine the remaining length of the string, in octets. This is called a 'length-prefixed string'. [67/75]
Since there is no fixed size for this data type, it can't simply be stored in a variable. Instead, it has to be stored on the heap, with a pointer pointing at the address of the first octet of the string starts. [68/75]
If the string "xx" is stored starting at address 00000000C7000012, a variable containing that has to be a pointer with the value 00000000C7000012. String functions can then read the length prefix or read until a null character to know when to stop reading the data. [69/75]
Another data type that involves pointers is known as an array. An array is effectively a collection of a known number of elements of a particular type. For example, an int array is a collection of a known number of ints. [70/75]
The values of the array are stored adjacent to each other. An int array containing the values 1, 2, 3 would look like this: 00000000 00000000 00000000 00000001 00000000 00000000 00000000 00000010 00000000 00000000 00000000 00000011 00000000 00000000 00000000 00000100. [71/75]
Suppose this was stored at address 0000000012340000. To access the first element of the array, addresses 0000000012340000-0000000012340003 would be read; to access the second element, addresses 0000000012340004-0000000012340007 would be read; and so on. [72/75]
A variable, perhaps myArray, containing that array will actually be a pointer to the first address in the array: 0000000012340000. The first element would be referred to as 'myArray[0]'; the last as 'myArray[2]'. Indexes start from zero. [73/75]
If the second value needs to be changed to 6, the C++ statement 'myArray[1] = 6' can be used, which will set addresses 0000000012340004-0000000012340007 to the value 00000000 00000000 00000000 00000110. [74/75]
So this is the end of this thread. I may or may not make something else like this in the future, so don't take my word for that I guess. Hopefully you've learned new things that are gonna help you in your low-level software engineering journey. All the best! [75/75]
You can follow @realm_of_sara.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.