Python has input/output features which are very easy to use. Files are accessed through file objects and you can open, read, write and close files using a very simple and easy to use functions from the standard library. To manage files and deal with special file formats (xml, json..) python provides special packages that make the developer life ever easier.
Filtering Files – inputfile module
If you write scripts for automation , you are probably familiar with commands like grep, head, tail and more. These commands gets a file (or more) as an input and filter its content to the output. If you want to write a similar task in python, the inputfile module is very helpful
import fileinput for line in fileinput.input(): print (fileinput.filename()) print (line + ':' + str(fileinput.lineno()))
run this script with a list of filenames as parameters or patterns:
# inpdemo *.txt
The script loops over all the lines in all txt files, printing the file name, line content and line number. To write a filter program, just add conditions:
import sys, fileinput, re, glob pattern = sys.argv.pop(1) # uncomment in windows #sys.argv[1:] = glob.glob(sys.argv[1]) for line in fileinput.input(): res = re.search(pattern, line) if res: print (fileinput.filename()) print (line + ':' + str(fileinput.lineno()))
Using the script: (searching for lines with string hello)
# inpdemo hello *.txt
Serialization with pickle
Pickle module converts Python objects into a stream of bytes usually written to a file, or across a network. To use pickle, open a binary file and dump/load your objects.
import pickle d1={'name':'liran', 'id':1000, 'age': 45} outp = open('customer', 'wb') pickle.dump(d1, outp) outp.close()
and load:
import pickle inp = open('customer', 'rb') cust = pickle.load(inp) print(cust) inp.close()
Types supported by pickle:
- All primitive types
- strings, bytearrays
- collections with pickable objects (set, list, tuple, dictionary)
custom types:
class Student(object): def __init__(self, id = 0, name = ''): self.__id = id self.__name = name def pr_student(self): print("id=" + str(self.__id) + " name=" + str(self.__name)) import pickle s=Student(100, 'avi') s.pr_student(); outp = open('students', 'wb') pickle.dump(s, outp) outp.close() inp = open('students', 'rb') d = pickle.load(inp) d.pr_student(); inp.close()
Note the pickle protocol attribute (see docs)
File Compression with bz2, gzip
With bz2, gzip modules , you can create an archived compressed file
import pickle, gzip, bz2 s=Student(100, 'avi') s.pr_student(); outp = bz2.open('customer.bz2', 'wb') pickle.dump(s, outp) outp.close() inp = bz2.open('customer.bz2', 'rb') d = pickle.load(inp) d.pr_student(); inp.close()
Pickling on a large scale with shelve
if you are going to use pickle on a large scale, the shelve module uses a database to store pickle objects by a string key:
import shelve s=Student(100, 'avi') s.pr_student(); outp = shelve.open('customer.dat') outp['c1'] = s outp.close() inp = shelve.open('customer.dat') d = inp['c1'] d.pr_student(); inp.close()
JSON Files
JSON files are very useful for saving data offline, saving configuration and more. In the following example we use the json module to dump a dictionary to a JSON file
import json list = ['foo', {'bar': ('baz', None, 1.0, 2)}] with open("dict.json", 'w') as d: json.dump(list,d)
the generated file: (note the conversions)
["foo", {"bar": ["baz", null, 1.0, 2]}]
XML Files
You can find many modules and packages for handling and parsing XML files. One simple module is minidom:
Parsing XML string:
import xml.dom.minidom doc = xml.dom.minidom.parseString('<site>devarea.com</site>')
Parsing XML file:
doc = xml.dom.minidom.parse('sites.xml')
And navigating in the DOM object:
print(doc.childNodes) print(doc.firstChild.tagName) ...
And many more
CSV Files
CSV files are used to store tables. All database systems can import and export data in CSV format. Use the csv module to handle CSV files:
import csv with open('students.csv') as my_file: reader = csv.DictReader(my_file) for row in reader: print(row['name'], row['city'])
The DictReader converts each row to a dictionary
Configuration Files – INI files
Warning!!! – Not a windows fan? skip this section
Windows uses ini files for settings and configurations. The configparser module helps you writing and parsing those files:
from configparser import * config = ConfigParser() config.add_section('GLOBALS') config.set ('GLOBALS', 'TRACE', 'True') config.add_section('FILENAMES') config.set ('FILENAMES', 'DIR','myapp') config.set ('FILENAMES', 'MASTER','%(dir)s\\master') config.set ('FILENAMES', 'SLAVE','%(dir)s\\slave') fh = open("config.ini", "w") config.write(fh) fh.close() # now read the file config.read('config.ini') master = config.get ('FILENAMES','master') print (master) print (config.getboolean('GLOBALS', 'TRACE') )
the generated file:
[GLOBALS] trace = True [FILENAMES] dir = myapp master = %(dir)s\master slave = %(dir)s\slave
5 thoughts on “8 Python Modules For Files Handling”
Comments are closed.
Yaml is another great choice for config files and it works well in many languages.
Thank you very much for this great tutorial.
It’s helpful to me.
For XML, I would bring up the xml.etree.ElementTree as well. The DOM model is cumbersome and ElementTree makes a lot of things much easier.
simplejson has some advances over json, you could show that, too
I’m using very easy “configobj” – reads conf file from type key=value and can use key as global variable