Update my Contacts with Python: exploring LinkedIn’s and iCloud’s Contact APIs

TL;DR Wow is it an adventure to decipher how to interact with undocumented web services like I found on LinkedIn and iCloud.  Migrating data from LinkedIn to iCloud looks possible, but I got stuck at implementing the PUT operation to iCloud using Python.

Background: Because I have a shoddy memory for details about all the people I meet, and because LinkedIn appears to be de-prioritizing their role as a professional contact manager, I want to make my iPhone Contacts my system of record for all data about people I meet professionally.  Which means scraping as much useful data as possible from LinkedIn and uploading it to iCloud Contacts (since my people-centric data is currently centered more around my iPhone than a Google Contacts approach).

In our last adventure, I stumbled across the a surprisingly well-formed and useful API for pulling data from LinkedIn about my Connections:

https://www.linkedin.com/connected/api/v2/contacts?start=40&count=10&fields=id%2Cname%2CfirstName%2ClastName%2Ccompany%2Ctitle%2Clocation%2Ctags%2Cemails%2Csources%2CdisplaySources%2CconnectionDate%2CsecureProfileImageUrl&sort=CREATED_DESC&_=1481999304007

Available Data

Which upon inspection of the results, gives me a lot of the data I was hoping to import into my iCloud Contacts:

  • crucial: Date we first connected on LinkedIn (“connectionDate” as time-since-epoch), Tags (“tags” as list of dictionaries), Picture (“profileImageUrl” as URI), first name (“firstName” as string), last name (“lastName” as string)
  • want: current company (“company” as dictionary), current title (“title” as string)
  • metadata: phone number (“phoneNumbers” as dictionary)

What doesn’t it give?  Notes, Twitter ID, web site addresses, previous companies, email address.  [What else does it give that could be useful?  LinkedIn profile URL (“profileUrl” as the permanent URL, not the “friendly URL” that many of us have generated such as https://www.linkedin.com/in/mikelonergan.  I can see how it would be helpful at a meetup to browse through my iPhone contacts to their LinkedIn profile to refresh myself on their work history.  Creepy, desperate, but something I’ve done a few times when I’m completely blanking.]

What can I get from the User Data Archive?  Notes are found in the Contacts.csv, and email address is found in Connections.csv.  Matching those two files’ data together with what I can pull from the Contacts API shouldn’t be a challenge (concat firstName + lastName, and among the data set of my 684 contacts, I doubt I’ll find any collisions).  Then matching those records to my iCloud Contacts *should* be just a little harder (I expect to match 50% of my existing contacts by emailAddress, then another fraction by phone number; the rest will likely be new records for my Contacts, with maybe one or two that I’ll have to merge by hand at the end).

Planning the “tracer bullet”

So what’s the smallest piece of code I can pull together to prove this scenario actually works?  It’ll need at least these features (assumes Python):

  1. can authenticate to LinkedIn via at least one supported protocol (e.g. OAuth 2.0)
  2. can pull down the first 10 JSON records from Contacts API and hold them in a list
  3. can enumerate the First + Last Name and pull out “title” for that record
  4. can authenticate to iCloud
    • Note: I may need to disable 2-factor authentication that is currently enabled on my account
  5. can find a matching First + Last Name in my iCloud Contacts
  6. can write the title field to the iCloud contact
    • Note: I’m worried least about existing data for the title field
  7. can upload the revised record to iCloud so that it replicates successfully to my iPhone

That should cover all the essential operations for the least-complicated data, without having to worry about edge cases like “what if the contact doesn’t exist in iCloud” or “what if there’s already data in the field I want to fill”.

Step 1: authenticate to LinkedIn

There are plenty of packages and modules on Github for accessing LinkedIn, but the ones I’ve evaluated all use the REST APIs, with their dual-secrets authentication mechanism, to get at the data.  (e.g. this one, this one, that one, another one).

Or am I making this more complicated than it is?  This python module simply used username + password in their call to an HTTP ‘endpoint’.  Let’s assume that judicious use of the requests package is sufficient for my needs.

I thought I’d build an anaconda kernel and a jupyter notebook to experiment with the modules I’m looking at.   And when I attempted to install the requests package in my new Anaconda environment, I get back this error:

LinkError:
Link error: Error: post-link failed for: openssl-1.0.2j-0

Quick search turns up a couple of open conda issues that don’t give me any immediate relief. OK, forget this for a bit – the “root” kernel will do fine for the moment.

Next let’s try this code and see what we get back:

import requests
r = requests.get('https://www.linkedin.com/connected/api/v2/contacts?start=40&count=10&fields=id%2Cname%2CfirstName%2ClastName%2Ccompany%2Ctitle%2Clocation%2Ctags%2Cemails%2Csources%2CdisplaySources%2CconnectionDate%2CsecureProfileImageUrl&sort=CREATED_DESC&_=1481999304007', auth=('mikethecanuck@gmail.com', 'linkthis'))
r.status_code

Output is simply “401”.  Dang, authentication wasn’t *quite* that easy.

So I tried that URL in an incognito tab, and it displays this to me without an existing auth cookie:

{"status":"Member is not Logged in."}

And as soon as I open another tab in that incognito window and authenticate to the linkedin.com site, the first tab with that contacts query returns the detailed JSON I was expecting.

Digging deeper, it appears that when I authenticate to https://www.linkedin.com through the incognito tab, I receive back one cookie labelled “lidc”, and that an “lidc” cookie is also sent to the server on the successful request to the contacts API.

But setting the cookie manually with the value returned from a previous request still leads to 401 response:

url = 'https://www.linkedin.com/connected/api/v2/contacts?start=40&count=10&fields=id%2Cname%2CfirstName%2ClastName%2Ccompany%2Ctitle%2Clocation%2Ctags%2Cemails%2Csources%2CdisplaySources%2CconnectionDate%2CsecureProfileImageUrl&sort=CREATED_DESC&_=1481999304007'
cookies = dict(lidc="b=OGST00:g=43:u=1:i=1482261556:t=1482347956:s=AQGoGetJeZPEDz3sJhm_2rQayX5ZsILo")
r2 = requests.get(url, cookies=cookies)

I tried two other approaches that people have used in the past – some even successfully with certain pages on LinkedIn – but eventually I decided that I’m getting ratholed on trying to reverse-engineer an undocumented (and more than likely unusually-constructed) API, when I can quite easily dump the data out of the API by hand and then do the rest of my work successfully.  (Yes I know that disqualifies me as a ‘real coder’, but I think we both know I was never going to win that medal – but I will win the medal for “results-oriented” not “pedantically chasing my tail”.)

Thus, knowing that I’ve got 684 connections on LinkedIn (saw that in the footer of a response), I submitted the following queries and copy-pasted the results into 4 separate .JSON files for offline processing:

https://www.linkedin.com/connected/api/v2/contacts?start=0&count=200&fields=id%2Cname%2CfirstName%2ClastName%2Ccompany%2Ctitle%2Clocation%2Ctags%2Cemails%2Csources%2CdisplaySources%2CconnectionDate%2CsecureProfileImageUrl&sort=CREATED_DESC&_=1481999304007

https://www.linkedin.com/connected/api/v2/contacts?start=200&count=200&fields=id%2Cname%2CfirstName%2ClastName%2Ccompany%2Ctitle%2Clocation%2Ctags%2Cemails%2Csources%2CdisplaySources%2CconnectionDate%2CsecureProfileImageUrl&sort=CREATED_DESC&_=1481999304007

https://www.linkedin.com/connected/api/v2/contacts?start=400&count=200&fields=id%2Cname%2CfirstName%2ClastName%2Ccompany%2Ctitle%2Clocation%2Ctags%2Cemails%2Csources%2CdisplaySources%2CconnectionDate%2CsecureProfileImageUrl&sort=CREATED_DESC&_=1481999304007

https://www.linkedin.com/connected/api/v2/contacts?start=600&count=200&fields=id%2Cname%2CfirstName%2ClastName%2Ccompany%2Ctitle%2Clocation%2Ctags%2Cemails%2Csources%2CdisplaySources%2CconnectionDate%2CsecureProfileImageUrl&sort=CREATED_DESC&_=1481999304007

Oddly, the four sets of results contain 196, 198, 200 and 84 items – they assert that I have 684 connections, but can only return 678 of them?  I guess that’s one of the consequences of dealing with a “free” data repository (even if it started out as mine).

Step 2: read the JSON file and parse a list of connections

I’m sure I could be more efficient than this, but as far as getting a working result, here’s the arrangement of code I used to start accessing structured list data from the Contacts API output I shunted to a file:

import json
import os
contacts_file = open("Connections-API-results.json")
contacts_data = contacts_file.read()
contacts_json = json.loads(contacts_data)
contacts_list = contacts_json['values']

Step 3: pulling data out of the list of connections

It turns out this is pretty easy, e.g.:

for contact in contacts_list:
 print(contact['name'], contact['title'])

Messing around a little further, trying to make sense of the connectionDate value from each record, I found that this returns an ISO 8601-style date string that I can use later:

import time
print(strftime("%Y-%m-%d", time.localtime(contacts_list[15]['connectionDate'] / 1000)))

e.g. for the record at index “15”, that returned 2007-03-15.

Data issue: it turns out that not all records have a profileImageUrl key (e.g. for those oddball security geeks among my contacts who refuse to publish a photo on their LinkedIn profile), so I got to handle my first expected exception 🙂

Assembling all the useful data for all my Connections I wanted into a single dictionary, I was able to make the following work (as you can find in my repo):

stripped_down_connections_list = []

for contact in contacts_list:
 name = contact['name']
 first_name = contact['firstName']
 last_name = contact['lastName']
 title = contact['title']
 company = contact['company']['name']
 date_first_connected = time.strftime("%Y-%m-%d", time.localtime(contact['connectionDate'] / 1000))

picture_url = ""
 try:
 picture_url = contact['profileImageUrl']
 except KeyError:
 pass

tags = []
for i in range(len(contact['tags'])):
tags.append(contact['tags'][i]['name'])

phone_number = ""
try:
 phone_number = {"type" : contact['phoneNumbers'][0]['type'], 
 "number" : contact['phoneNumbers'][0]['number']}
except IndexError:
 pass

stripped_down_connections_list.append({"firstName" : contact['firstName'], 
 "lastName" : contact['lastName'], 
 "title" : contact['title'], 
 "company" : contact['company']['name'],
 "connectionDate" : date_first_connected, 
 "profileImageUrl" : picture_url,
 "tags" : tags,
 "phoneNumber" : phone_number,})

Step 4: Authenticate to iCloud

For this step, I’m working with the pyicloud package, hoping that they’ve worked out both (a) Apple’s two-factor authentication and (b) read/write operations on iCloud Contacts.

I setup yet another jupyter notebook and tried out a couple of methods to import PyiCloud (based on these suggestions here), at least one of which does a fine job.  With picklepete’s suggested 2FA code added to the mix, I appear to be able to complete the authentication sequence to iCloud.

APPLE_ID = 'REPLACE@ME.COM'
APPLE_PASSWORD = 'REPLACEME'

from importlib.machinery import SourceFileLoader

foo = SourceFileLoader("pyicloud", "/Users/mike/code/pyicloud/pyicloud/__init__.py").load_module()
api = foo.PyiCloudService(APPLE_ID, APPLE_PASSWORD)

if api.requires_2fa:
    import click
    print("Two-factor authentication required. Your trusted devices are:")

    devices = api.trusted_devices
    for i, device in enumerate(devices):
        print(" %s: %s" % (i, device.get('deviceName',
        "SMS to %s" % device.get('phoneNumber'))))

    device = click.prompt('Which device would you like to use?', default=0)
    device = devices[device]
    if not api.send_verification_code(device):
        print("Failed to send verification code")
        sys.exit(1)

    code = click.prompt('Please enter validation code')
    if not api.validate_verification_code(device, code):
        print("Failed to verify verification code")
        sys.exit(1)

Step 5: matching on First + Last with iCloud

Caveat: there are a number of my contacts who have appended titles, certifications etc to their lastName field in LinkedIn, such that I won’t be able to match them exactly against my cloud-based contacts.

I’m not even worried about this step, because I quickly got worried about…

Step 6: write to the iCloud contacts (?)

Here’s where I’m stumped: I don’t think the PyiCloud package has any support for non-GET operations against the iCloud Contacts service.  There appears to be support for POST in the Reminders module, but not in any of the other services modules (including Contacts).

So I sniffed the wire traffic in Chrome Dev Tools, to see what’s being done when I make an update to any iCloud.com contact.  There’s two possible operations: a POST method call for a new contact, or a a PUT method call for an update to an existing contact.

Here’s the Request Payload for a new contact:

{“contacts”:[{“contactId”:”2EC49301-671B-431B-BC8C-9DE6AE15D21D”,”firstName”:”Tony”,”lastName”:”Stank”,”companyName”:”Stark Enterprises”,”isCompany”:false}]}

Here’s the Request Payload for an update to that existing contact (I added homepage URL):

{“contacts”:[{“firstName”:”Tony”,”lastName”:”Stank”,”contactId”:”2EC49301-671B-431B-BC8C-9DE6AE15D21D”,”prefix”:””,”companyName”:”Stark Enterprises”,”etag”:”C=1432@U=afe27ad8-80ce-4ba8-985e-ec4e365bc6d3″,”middleName”:””,”isCompany”:false,”suffix”:””,”urls”:[{“label”:”HOMEPAGE”,”field”:”http://stark.com”}]}]}

There are four requests being made for either type of change to iCloud contacts (at least via the iCloud.com web interface that I am using as a model for what the code should be doing):

  1. https://p28-contactsws.icloud.com/co/contacts/card/
  2. https://webcourier.push.apple.com/aps
  3. https://p28-contactsws.icloud.com/co/changeset
  4. https://feedbackws.icloud.com/reportStats

Here’s the details for these calls when I create a new Contact:

  1. Request URL: https://p28-contactsws.icloud.com/co/contacts/card/?clientBuildNumber=16HProject79&clientId=63D7078B-F94B-4AB6-A64D-EDFCEAEA6EEA&clientMasteringNumber=16H71&clientVersion=2.1&dsid=197715384&prefToken=914266d4-387b-4e13-a814-7e1b29e001c3&syncToken=DAVST-V1-p28-FT%3D-%40RU%3Dafe27ad8-80ce-4ba8-985e-ec4e365bc6d3%40S%3D1426
    Request Payload: {“contacts”:[{“contactId”:”E2DDB4F8-0594-476B-AED7-C2E537AFED4C”,”urls”:[{“label”:”HOMEPAGE”,”field”:”http://apple.com”}],”phones”:[{“label”:”MOBILE”,”field”:”(212) 555-1212″}],”emailAddresses”:[{“label”:”WORK”,”field”:”johnny.appleseed@apple.com”}],”firstName”:”Johnny”,”lastName”:”Appleseed”,”companyName”:”Apple”,”notes”:”Dummy contact for iCloud automation experiments”,”isCompany”:false}]}
  2. Request URL: https://p28-contactsws.icloud.com/co/changeset?clientBuildNumber=16HProject79&clientId=63D7078B-F94B-4AB6-A64D-EDFCEAEA6EEA&clientMasteringNumber=16H71&clientVersion=2.1&dsid=197715384&prefToken=914266d4-387b-4e13-a814-7e1b29e001c3&syncToken=DAVST-V1-p28-FT%3D-%40RU%3Dafe27ad8-80ce-4ba8-985e-ec4e365bc6d3%40S%3D1427
  3. Request URL: https://webcourier.push.apple.com/aps?tok=bc3dd94e754fd732ade052eead87a09098d3309e5bba05ed24272ede5601ae8e&ttl=43200
  4. Request URL: https://feedbackws.icloud.com/reportStats
    Request Payload: {“stats”:[{“httpMethod”:”POST”,”statusCode”:200,”hostname”:”www.icloud.com”,”urlPath”:”/co/contacts/card/”,”clientTiming”:395,”uncompressedResponseSize”:14469,”region”:”OR”,”country”:”US”,”time”:”Wed Dec 28 2016 12:13:48 GMT-0800 (PST) (1482956028436)”,”timezone”:”PST”,”browserLocale”:”en-us”,”statName”:”contactsRequestInfo”,”sessionID”:”63D7078B-F94B-4AB6-A64D-EDFCEAEA6EEA”,”platform”:”desktop”,”appName”:”contacts”,”isLiteAccount”:false},{“httpMethod”:”POST”,”statusCode”:200,”hostname”:”www.icloud.com”,”urlPath”:”/co/changeset”,”clientTiming”:237,”uncompressedResponseSize”:2,”region”:”OR”,”country”:”US”,”time”:”Wed Dec 28 2016 12:13:48 GMT-0800 (PST) (1482956028675)”,”timezone”:”PST”,”browserLocale”:”en-us”,”statName”:”contactsRequestInfo”,”sessionID”:”63D7078B-F94B-4AB6-A64D-EDFCEAEA6EEA”,”platform”:”desktop”,”appName”:”contacts”,”isLiteAccount”:false}]}

I am 99% sure that the only request that actually changes the Contact data is the first one (https://p28-contactsws.icloud.com/co/contacts/card/), so I’ll ignore the other three calls from here on out.

Here’s the details of the first request when I edit an existing Contact:

Request URL: https://p28-contactsws.icloud.com/co/contacts/card/?clientBuildNumber=16HProject79&clientId=792EFA4A-5A0D-47E9-A1A5-2FF8FFAF603A&clientMasteringNumber=16H71&clientVersion=2.1&dsid=197715384&method=PUT&prefToken=914266d4-387b-4e13-a814-7e1b29e001c3&syncToken=DAVST-V1-p28-FT%3D-%40RU%3Dafe27ad8-80ce-4ba8-985e-ec4e365bc6d3%40S%3D1427
Request Payload: {“contacts”:[{“lastName”:”Appleseed”,”notes”:”Dummy contact for iCloud automation experiments”,”contactId”:”E2DDB4F8-0594-476B-AED7-C2E537AFED4C”,”prefix”:””,”companyName”:”Apple”,”phones”:[{“field”:”(212) 555-1212″,”label”:”MOBILE”}],”isCompany”:false,”suffix”:””,”firstName”:”Johnny”,”urls”:[{“field”:”http://apple.com”,”label”:”HOMEPAGE”},{“label”:”HOME”,”field”:”http://johnny.name”}],”emailAddresses”:[{“field”:”johnny.appleseed@apple.com”,”label”:”WORK”}],”etag”:”C=1427@U=afe27ad8-80ce-4ba8-985e-ec4e365bc6d3″,”middleName”:””}]}

So here’s what’s puzzling me so far: both the POST (create) and PUT (edit) operations include a contactId parameter.  Its value is the same from POST to PUT (i.e. I believe that means it’s referencing the same record).  When I create a second new Contact, the contactId is different than the contactId submitted in the Request Payload for the first new Contact (so it’s presumably not a dummy value).  And yet when I look at the request/response for the initial page load when I click “+” and “New Contact”, I don’t see a request sent from the browser to the server (so the server isn’t sending down a contactID – not at that moment at least – perhaps it’s cached earlier?).

Explained another way, this is how I believe the sequence works (based on repeated analysis of the network traffic from Chrome to the iCloud endpoint and back):

  1. User loads icloud.com, Contacts page (#contacts), clicks “+” and selects “New Contact”
    • Browser sends no request, but rather builds the New Contact form from cached code
  2. User adds data and clicks the Done button for the new Contact
    • Browser sends POST request to https://p28-contactsws.icloud.com/co/contacts/card/ with a bunch of form data on the URL, a whole raft of cookies and the JSON request payload [including contactId=x]
    • Server sends response
  3. User clicks Edit on that new contact, updates some data and clicks Done
    • Browser sends PUT request to https://p28-contactsws.icloud.com/co/contacts/card/ with form data, cookies and JSON request payload [including the same contactId=x]
    • Server sends response

So the question is: if I’m creating a net-new Contact, how does the web client get a valid contactId that iCloud will accept?  Near as I can figure, digging through the javascript-packed.js this page uses, this is the function that generates a UUID at the client:

Contacts.Contact = Contacts.Record.extend({
 primaryKey: "contactId",
 contactId: CW.Record.attr(String, {
 defaultValue: function() {
 return CW.upperCaseUUID()
 }
 })

Using this function (IIUC):

UUID: function() {
 var e = new Array(36),
 t = 0,
 n = ["8", "9", "a", "b"];
 if (window.crypto && window.crypto.getRandomValues) {
 var r = new Uint8Array(18);
 crypto.getRandomValues(r);
 for (t = 0; t < 18; t++) e[t * 2 + 1] = (r[t] >> 4).toString(16), e[t * 2] = (r[t] & 15).toString(16);
 e[19] = n[r[9] >> 6]
 } else {
 while (t < 36) e[t] = (Math.random() * 16 | 0).toString(16), t++;
 e[19] = n[Math.random() * 4 | 0]
 }
 return e[8] = e[13] = e[18] = e[23] = "-", e[14] = "4", e.join("")
 }

[Aside: I sincerely hope this is a standard library for UUID, not something Apple wrote themselves.  If I ever think that I’m going to need to generate iCloud-compatible UUIDs.]

Whoa – Pause

I need to take a step back and re-examine my goals and what I can specifically address.  I have learned a lot about both LinkedIn and iCloud, but I didn’t set out to recreate them, just find a way to make consistent use of the data I already have.

Advertisements

3 thoughts on “Update my Contacts with Python: exploring LinkedIn’s and iCloud’s Contact APIs

    1. Thanks for the question – it gave me a good reason to dig in and try to manipulate the LinkedIn “fields” query parameter. (Emphasis on the word “try”.)

      I tried adding the keyword “industry” to the comma-separated value, and noticed that none of my response items included it.

      Then I carved down the comma-separated value to a very short list, to see if the API was responsive to *fewer* rather than *more* fields, like so (choosing only id, name, company for example): https://www.linkedin.com/connected/api/v2/contacts?start=100&count=1&fields=id%2Cname%2Ccompany%2Ctitle&sort=CREATED_DESC&_=1481999304007

      Turns out, the “fields” parameter must be hard-coded. I can manipulate the other query parameters like start, count, sort – but I get back the same data fields for each contact no matter what I put in the fields parameter.

      Maybe this was once flexible but was hard-coded due to abuse, or maybe they always intended to make it responsive but never implemented that additional code. Either way, you get what you get. Sorry if I’d given you false hope.

      The only other way I can think of to obtain the Industry data is to use a web scraper like Beautiful Soup + a lot of Python code (more than I have accumulated so far, anyway).

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s