Today I will be talking about some of the data structures we use regularly when doing data science work. I will start with numpy's ndarray.
What is an ndarray? It's numpy's abstraction for describing an array, or a group of numbers. In math terms, arrays are a "catch all" term used to describe matrices or vectors. Behind the scenes, it essentially describes memory using several key attributes:
* pointer: the memory address of the first byte in the array
* type: the kind of elements in the array, such as floats or ints
* shape: the size of each dimension of the array (ex: 5 x 5 x 5)
* strides: number of bytes to skip to proceed to the next element
* flags
The "stride" attribute here is key. it allows you to subset or view data *without* copying it, which saves time and space/memory. In this example, `x` and `y` share memory, even though they aren't exactly the same array! This is very helpful when working with "big data."
So this is why, if you've ever modified a slice of a numpy array, you end up modifying the original array!
The `stride` attribute is not only relevant when slicing arrays. Transposes, reshapes, and other operations take advantage of the `stride` attribute to avoid copying large amounts of data. Stay tuned for the next thread on vectorization.
You can follow @WomenInStat.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.