Update my Contacts with Python: thinking through the options

Alright folks, that’s the bell.  When LinkedIn stops thinking of itself as a professional contact manager, you know there’s no profit in it, and it’s time to manage this stuff yourself.

Problem To Solve

I’ve been hemming and hawing for a couple of years, ever since Evernote shut down their Hello app, about how to remember who I’ve met and where I met them.  I’m a Meetup junkie (with no rehab in sight) and I’ve developed a decent network of friends and acquaintances that make it easy for me to attend new events and conferences in town – I’ll always “know” someone there (though not always remember why/how I know them or even what their name is).

When I first discovered Evernote Hello, it seemed like the perfect tool for me – provided me a timeline view of all the people I’d met, with rich notes on all the events I’d seen them at and where those places were.  It never entirely gelled, it sporadically did and did NOT support business card import (pay for play mostly), and it was only good for those people who gave me enough info for me to link them.  Even with all those imperfections, I remember regularly scanning that list (from a quiet corner at a meetup/party/conference) before approaching someone I *knew* I’d seen before, but couldn’t remember why.  [Google Glasses briefly promised to solve this problem for me too, but that tech is off somewhere, licking its wounds and promising to come back in ten years when we’re ready for it.]

What other options do I have, before settling in to “do it myself”?

  • Pay the big players e.g. SalesForce, LinkedIn
    • Salesforce: smallest SKUs I could find @ $25/month [nope]
    • LinkedIn “Sales” SKU: $65/month [NOPE]
  • Get a cheap/trustworthy/likely-to-survive-more-than-a-year app
    • Plenty of apps I’ve evaluated that sound sketchy, or likely to steal your data, or are so under-funded that they’re likely to die off in a few months

Requirements

Do it myself then.  Now I’ve got a smaller problem set to solve:

  1. Enforce synchronization between my iPhone Contacts.app, the iCloud replica (which isn’t a perfect replica) and my Google Contacts (which are a VERY spotty replica).
    • Actually, let’s be MVP about this: all I *need* right now is a way of automating edits to Contacts on my iPhone.  I assume that the most reliable way of doing this is to make edits to the iCloud.com copy of the contact and let it replicate down to my phone.
    • the Google Contacts sync is a future-proofing move, and one that theoretically sounded free (just needed to flip a toggle on my iPhone profile), but which in practice seems to be built so badly that only about 20% of my contacts have ever sync’d with Google
  2. Add/update information to my contacts such as photos, “first met” context (who introduced, what event met at) and other random details they’ve confessed to me (other attempts to hook my memory) – *WITHOUT* linking my iPhone contacts with either LinkedIn or Facebook (who will of course forever scrape all that data up to their cloud, which I do *not* want to do – to them or me).

Test the Sync

How can I test my requirements in the cheapest way possible?

  • Make hand edits to the iCloud.com contacts and check that it syncs to the iPhone Contacts.app
    • Result: sync to iPhone within seconds
  •  Make hand edits to contacts in Contacts.app and check that it syncs to iCloud.com contact
    • Result: sync to iCloud within seconds

OK, so once I have data that I want to add to an iCloud contact, and code (Python for me please!) that can write to iCloud contacts, it should be trivial to edit/append.

Here’s all the LinkedIn Data I Want

Data that’s crucial to remembering who someone is:

  • Date we first connected on LinkedIn
  • Tags
  • Notes
  • Picture

Additional data that can help me fill in context if I want to dig further:

  • current company
  • current title
  • Twitter ID
  • Web site addresses
  • Previous companies

And metadata that can help uniquely identify people when reading or writing from other directories:

  • Email address
  • Phone number

How to Get my LinkedIn connection data?

OK, so (as of 2016-12-15 at 12:30pm PST) there’s three ways I can think of pulling down the data I’ve peppered into my LinkedIn connections:

  1. User Data Archive: request an export of your user data from LinkedIn
  2. LinkedIn API: request data for specified Connections using LinkedIn’s supported developer APIs
  3. Web Scraping: iterate over every Connection and pull fields via CSS using e.g. Beautiful Soup

User Data Archive

This *sounds* like the most efficient and straightforward way to get this data.  The “Relationship Section” announcement even implies that I’ll get everything I want:

If you want to download your existing Notes and Tags, you’ll have the option to do so through March 31, 2017…. Your notes and tags will be in the file named Contacts.

The initial data dump included everything except a Contacts.csv file.  The later Complete_LinkedInDataExport_12-16-2016 [ISO 8601 anyone?] included the data promised and nearly nothing else:

  • Connections.csv: First Name, Last Name, Email Address, Current Company, Current Position, Tags
  • Contacts.csv: First Name, Last Name, Email (mostly blank), Notes, Tags

I didn’t expect to get Picture, but I was hoping for Date First Connected, and while the rest of the data isn’t strictly necessary, it’s certainly annoying that LinkedIn is so friggin frugal.

Regardless, I have almost no other source for pictures for my professional contacts, and that is pretty essential for recalling someone I’ve met only a handful of times, so while helpful, this wasn’t sufficient.

LinkedIn API

The next most reliable way to attack this data is to programmatically request it.  However, as I would’ve expected from this “roach motel” of user-generated data, they don’t even support an API to request all Connections from your user account (merely sign-in and submit data).

Where they do make reference to user data, it’s in a highly-regulated set of Member Profile fields:

  • With the r_basicprofile permission, you can get first-name, last-name, positions, picture-url plus some other data I don’t need
  • With the r_emailaddress permission, you can get the user’s primary email address
  • For developers accepted into “Apply with LinkedIn”, and with the r_fullprofile permission, you can further get date-of-birth and member-url-resources
  • For those “Apply with LinkedIn” developers who have the r_contactinfo permssion, you can further get phone-numbers and twitter-accounts

After registering a new application, I am immediately given the ability to grant the following permissions to my app: r_basicprofile, r_emailaddress.  That’ll get me picture-url, if I can figure out a way to enumerate all the Connections for my account.

(A half-hour sorting through Chrome Dev Tools’ Network outputs later…)

Looks like there’s a handy endpoint that lets the browser enumerate pretty much all the data I want:

https://www.linkedin.com/connected/api/v2/contacts?start=40&count=10&fields=id%2Cname%2CfirstName%2ClastName%2Ccompany%2Ctitle%2Clocation%2Ctags%2Cemails%2Csources%2CdisplaySources%2CconnectionDate%2CsecureProfileImageUrl&sort=CREATED_DESC&_=1481999304007

That bears further investigation.

Web Scraping

While this approach doesn’t have the built-in restrictions with the LinkedIn APIs, there’s at least three challenges I can forsee so far:

  1. LinkedIn requires authentication, and OAuth 2.0 at that (at least for API access).  Integrating OAuth into a Beautiful Soup script isn’t something I’ve heard of before, but I’m seeing some interesting code fragments and tutorials that could be helpful, and it appears that the requests package can do OAuth 1 & 2.
  2. LinkedIn has helpfully implemented the “infinite scroll” AJAX behaviour on the Connections page.
    • There are ways to work with this behaviour, but it sure feels cumbersome – to the point I almost feel like doing this work by hand would just be faster.
  3. Navigating automatically to each linked page (each Connection) from the Connections page isn’t something I am entirely confident about
    • Though I imagine it should be as easy as “for each Connection in Connections, load the page, then find the data with this CSS attribute, and store it in an array of whatever form you like”.  The mechanize package promises to make the link navigation easy.

Am I Ready for This Much Effort?

It sure feels like there’s a lot of barriers in the way to just collecting the info I’ve accumulated in LinkedIn about my connections.  Would it take me less time to just browse each connection page and hand copy/paste the data from LinkedIn to iCloud?  Almost certainly.  To together a Beautiful Soup + requests + various github modules solution would probably take me 20-30 hours I’m guessing, from all the reading and piecing together code fragments from various sources, to debugging and troubleshooting, to making something that spits out the data and then automatically uploads it without mucking up existing data.

Kinda takes the fun out of it that way, doesn’t it?  I mean, the “glory” of writing code that’ll do something I haven’t found anyone else do, that’s a little boost of ego and all.  Still, it’s hard to believe this kind of thing hasn’t been solved elsewhere – am I the only person with this bad of a memory, and this much of a drive to keep myself from looking like Leonard Shelby at every meetup?

What’s worse though, for embarking on this thing, is that I’d bet in six months’ time, LinkedIn and/or iCloud will have ‘broken’ enough of their site(s) that I wouldn’t be able to just re-use what I wrote the first time.  Maintenance of this kind of specialized/unique code feels pretty brutal, especially if no one else is expected to use it (or at least, I don’t have any kind of following to make it likely folks will find my stuff on github).

Still, I don’t think I can leave this itch entirely unscratched.  My gut tells me I should dig into that Contacts API first before embarking on the spelunking adventure that is Beautiful Soup.

Advertisements

One thought on “Update my Contacts with Python: thinking through the options

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s