Jan 012016 Tagged with , , , 0 Responses

vCard to CSV Using Python Script

Recently one of the client from United Kingdom (UK) requested to scrape data from website which was showing basic details on web page and then having link to vCard file. vCard file having rest of details like Contact Name, Website, Email and Phone number.

web scraping using python

I tried some of the third-party web scraping software to see if any of them can scrape data from vCard files but unfortunately none of them worked. I then decided to download all vCard files locally and then planned to parse content using either PHP or Python.

So at the end I made python script to read all the downloaded vCard files inside script and parse data and store into CSV.

Here is the code to parse  contact details from vcard files:

import urllib2
import os
import csv
def get_data(csv_write,url):
	data=""
	print url
	try:
		data=urllib2.urlopen(url).read()
	except:
		pass
	email=""
	name=""
	website=""
        # Parsing email, name and website from vCard
	try:
		for str in data.split('\n'):
			if(str.find("FN:")>=0):
				name=str.replace("FN:","")
			if(str.find("URL;WORK:")>=0):
				website=str.replace("URL;WORK:","")
			if(str.find("EMAIL;TYPE=INTERNET;TYPE=PREF:")>=0):
				email=str.replace("EMAIL;TYPE=INTERNET;TYPE=PREF:","")
	except:
		pass
	csv_write.writerow([name,email,website])
if __name__=="__main__":
	input_file_name=raw_input("Enter the Linkedin URL file (.txt) : ")
	output_file_name=raw_input("Enter the output file (.csv) extention : ")
	try:
		f=open(input_file_name,"rb")
		lines=f.read().splitlines()
		f.close()

                #storing data to csv file
		output = open(output_file_name, 'vcarddata')
		writer = csv.writer(output, dialect=csv.excel, quoting=csv.QUOTE_ALL)
		row=["Name","Email","Website"]
		writer.writerow(row)
		
		for url in lines:
			get_data(writer,url)
		
	except Exception,e:
		print e
		pass

Hope you guys will enjoy this vCard parser which will do vcard to csv conversion job!

Leave a Reply

Your email address will not be published. Please enter your name, email and a comment.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>