From the course: Data Science Foundations: Python Scientific Stack [CoderPad]

Examine data

- [Instructor] We're going to look at some data about taxi rides published by the New York City. Before you can start, you need to run the download data script in the video folder. This will download the taxi.csv file. It might take a little bit, so I suggest that you'll keep this file around and copy it over from chapter to chapter. Let's see if you can safely load this data into memory. So from pathlib we import path, we create the csv file, define what is a megabyte, and then use the stat method of the file to get the size, which is in bytes, and divide it by a megabyte. Let's run this. And it's about 163 megabytes. It is safe to load into memory. Next, we're going to have a look at few of the initial lines and see how many lines there are in total. Initialize the number of lines, open the file, and use enumerate. If the line is one of the first five lines, I'm going to print it out. And always increment the counter. Finally, print out the number of lines. And we're going to run this one, you're going to see we have the vendor id, the pickup and drop off time, how many passenger in the passenger account, the distance, and many more attributes. And we have about a million lines.

Contents