8 Python Modules For Files Handling

Python has input/output features which are very easy to use. Files are accessed through file objects and you can open, read, write and close files using a very simple and easy to use functions from the standard library. To manage files and deal with special file formats (xml, json..) python provides special packages that make the developer life ever easier.

Filtering Files – inputfile module

If you write scripts for automation , you are probably familiar with commands like grep, head, tail and more. These commands gets a file (or more) as an input and filter its content to the output. If you want to write a similar task in python, the inputfile module is very helpful

import  fileinput

for line in fileinput.input():
      print (fileinput.filename())
      print (line + ':' + str(fileinput.lineno()))

run this script with a list of filenames as parameters or patterns:

# inpdemo *.txt

The script loops over all the lines in all txt files, printing the file name, line content and line number. To write a filter program, just add  conditions:

import sys, fileinput, re, glob


pattern = sys.argv.pop(1)

# uncomment in windows
#sys.argv[1:] = glob.glob(sys.argv[1])


for line in fileinput.input():
   res = re.search(pattern, line)
   if res:
      print (fileinput.filename())
      print (line + ':' + str(fileinput.lineno()))

Using the script: (searching for lines with string hello)

# inpdemo hello *.txt

 

Serialization with pickle

Pickle module converts Python objects into a stream of bytes usually written to a file, or across a network. To use pickle, open a binary file and dump/load your objects.

import pickle

d1={'name':'liran', 'id':1000, 'age': 45}

outp = open('customer', 'wb')
pickle.dump(d1, outp)
outp.close()

and load:

import pickle

inp = open('customer', 'rb')
cust = pickle.load(inp)
print(cust)
inp.close()

Types supported by pickle:

  • All primitive types
  • strings, bytearrays
  • collections with pickable objects (set, list, tuple, dictionary)

custom types:

class Student(object):
    def __init__(self, id = 0, name = ''):
        self.__id = id
        self.__name = name
    def pr_student(self):
        print("id=" + str(self.__id) + " name=" + str(self.__name))


import pickle

s=Student(100, 'avi')
s.pr_student();

outp = open('students', 'wb')
pickle.dump(s, outp)
outp.close()

inp = open('students', 'rb')
d = pickle.load(inp)
d.pr_student();
inp.close()

Note the pickle protocol attribute (see docs)

 

File Compression with bz2, gzip

With bz2, gzip modules , you can create an archived compressed file

import pickle, gzip, bz2

s=Student(100, 'avi')
s.pr_student();

outp = bz2.open('customer.bz2', 'wb')
pickle.dump(s, outp)
outp.close()

inp = bz2.open('customer.bz2', 'rb')
d = pickle.load(inp)
d.pr_student();
inp.close()

 

Pickling on a large scale with shelve

if you are going to use pickle on a large scale, the shelve module uses a database to store pickle objects by a string key:

import shelve

s=Student(100, 'avi')
s.pr_student();

outp = shelve.open('customer.dat')
outp['c1'] = s
outp.close()

inp = shelve.open('customer.dat')
d = inp['c1']
d.pr_student();
inp.close()

 

JSON Files

JSON files are very useful for saving data offline, saving configuration and more. In the following example we use the json module to dump a dictionary to a JSON file

import json

list = ['foo', {'bar': ('baz', None, 1.0, 2)}]
with open("dict.json", 'w') as d:
    json.dump(list,d)

the generated file: (note the conversions)

["foo", {"bar": ["baz", null, 1.0, 2]}]

 

XML Files

You can find many modules and packages for handling and parsing XML files. One simple module is minidom:

Parsing XML string:

import xml.dom.minidom
doc = xml.dom.minidom.parseString('<site>devarea.com</site>')

Parsing XML file:

doc = xml.dom.minidom.parse('sites.xml')

And navigating in the DOM object:

print(doc.childNodes)
print(doc.firstChild.tagName)
...

And many more

 

CSV Files

CSV files are used to store tables. All database systems can import and export data in CSV format. Use the csv module to handle CSV files:

import csv
with open('students.csv') as my_file:
   reader = csv.DictReader(my_file)
   for row in reader:
       print(row['name'], row['city'])

The DictReader converts each row to a dictionary

 

Configuration Files – INI files

Warning!!! – Not a windows fan? skip this section

 

Windows uses ini files for settings and configurations. The configparser module helps you writing and parsing those files:

from configparser import *
config = ConfigParser()

config.add_section('GLOBALS')
config.set ('GLOBALS', 'TRACE', 'True')
config.add_section('FILENAMES')
config.set ('FILENAMES', 'DIR','myapp')
config.set ('FILENAMES', 'MASTER','%(dir)s\\master')
config.set ('FILENAMES', 'SLAVE','%(dir)s\\slave')

fh = open("config.ini", "w")
config.write(fh)
fh.close()

# now read the file
config.read('config.ini')
master = config.get ('FILENAMES','master')
print (master)
print (config.getboolean('GLOBALS', 'TRACE') )

the generated file:

[GLOBALS]
trace = True

[FILENAMES]
dir = myapp
master = %(dir)s\master
slave = %(dir)s\slave

 

 

 

 

 

 

Tagged

5 thoughts on “8 Python Modules For Files Handling

  1. Yaml is another great choice for config files and it works well in many languages.

  2. Thank you very much for this great tutorial.
    It’s helpful to me.

  3. For XML, I would bring up the xml.etree.ElementTree as well. The DOM model is cumbersome and ElementTree makes a lot of things much easier.

  4. simplejson has some advances over json, you could show that, too

  5. I’m using very easy “configobj” – reads conf file from type key=value and can use key as global variable

Comments are closed.