Kiosko de vending Nails Fingernails2Go

Actualiza tu centro de manicura y estética al siguiente nivel con los mejores productos del mercado Solicita una presentación del producto y pruebalo !, pidenos presupuesto Los kioskos de vending…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Handle huge files in python

Several data science enthusiasts face difficulties in handling huge data file when they have only one machine available. The file size cannot be more than available memory in the machine. In most data science projects, the file can be processed in parts to get the required data for modeling. For example, group by operation to get mean, median, max , sum or other values. This generates a smaller version of the file which can be handled in memory. Now, the task is how to split the big file into smaller chunks.

People familiar with unix will say it can be done easily using shell commands or awk. The file can be splits by rows, columns, column value, size etc. Two common examples are as below:

2. Unix command to split by file size. The -b argument defines the filesize of output. Again, the output will have names as in above example.

The reader shall note that awk command in unix can be used to create different files split by values in a columns. But this is out of scope for this blog.

So, having spilt the data into smaller parts, a data scientist can do the processing on a single machine. Now, what if someone does not want to write extra lines to split data. Good news is pandas in python has the ability to handle these cases. Below is an example where a big file is read in a chunks and processed on the fly.

For R users, there are several packages available like data.table. I may cover in a later blog.

Hope, it helps. If you liked this blog, please clap and keep me motivated in sharing more!

Add a comment

Related posts:

Mindless scrolling

Many researchers proved the correlation between social media and anxiety, low self-esteem, lack of passion, and addiction also introduced to us the terms Mindless Scrolling or Zombie Scrolling. It’s…

The Importance of Strengths and Values

After my time in sorority life I have been able to understand and embrace my values and strengths. I have been able to serve as a Panhellenic Counselor teaching new girls going through recruitment…

StatsBomb

Articles written for StatsBomb:. “StatsBomb” is published by Kiyan Sobhani in KiyanSobhani.