How to get data from a public Instagram profile
Recently, when I was building my portfolio, I wanted to show about my hobbies. Right from the start I had the idea to show my instagram posts, everything to do! I had heard of many devs about the complexity of working with Instagram’s Graph API and the Instagram Basic Display API. I even understand the reason for the complexity and the need for authentication and everything that exists there and I don’t want to get into the merits about it, maybe I can even write about it in a separate article. However, my profile is public (if you are curious you can follow me there: https://www.instagram.com/carlos.henrique.reis.98/). I don’t want to be forced to go through all the bureaucracy of the Instagram API. So after doing some analysis of how an instagram profile is loaded by the browser, I found an alternative to get all my public posts and I will be explaining in this article.
Hands-on!
Let’s start by trying to understand how instagram loads your profile information when the browser makes a request for https://www.instagram.com/{userName}.
Taking a look at the network we see that the first request made is for route:
https://www.instagram.com/carlos.henrique.reis.98/?__a=1
Here the magic begins, as incredible as this route seems, it brings us all the data about the public account, a simple GET
request, without any authentication!
This route returns us a JSON with lots of information, everything is inside the graphql.user
object:
{
"graphql":{
...
"user":{
...
"id":"...",
"biography":"...",
"edge_followed_by":{
"count":311
},
"edge_follow":{
"count":214
},
"full_name":"Carlos Henrique Reis",
"profile_pic_url":"...",
"profile_pic_url_hd":"...",
"username":"carlos.henrique.reis.98",
"edge_owner_to_timeline_media":{
"count":138,
"page_info":{
"has_next_page":true,
"end_cursor":"..."
},
"edges":[...]
},
}
},
...
}
Basically everything that is public on your profile is present here. In my case I am concerned with the fields that I left listed below..
graphql.user.id:
User Id.graphql.user.biography:
Profile biography.graphql.user.edge_followed_by.count
: Followersgraphql.user.edge_followed_by.count
: Followed.graphql.user.full_name
: User full name.graphql.user.username
: Nick name.graphql.user.profile_pic_url
egraphql.user.profile_pic_url_h
: Url containing profile photo.graphql.user.edge_owner_to_timeline
: This array has all the data of the posts.
If you notice instagram doesn’t load all posts at once, it works with a pagination concept, and if we analyze the array edges
, we notice that only the first 12 posts came. In a little while we will see how to bring the next 12 and so on until you load all the posts!
Another interesting information is the page_info
object, in it we have the h has_next_page
field, which informs if there are still posts to be loaded besides the end_cursor
, which returns us a hash necessary to load the next page of posts.
The return of the https://www.instagram.com/{userName}/?__a=1 route brings us much more data. Feel free to analyze JSON, but for now we only need these fields.
As I mentioned earlier, this route brings us only the first 12 posts, clearing the network and giving a page down from the browser, we noticed that another route is called:
And it brings us the next 12 posts, it has the following structure:
If compared to the first request, this is a little more complex, because it has a lot more parameters in your request. Applying an encoded format URL and dividing things up, we have:
- Endpoint: https://www.instagram.com/graphql/query/
- Request method: GET
- Parameters:
{
"query_hash": 56a7068fea504063273cc2120ffd54f3,
"variables":{
"id":"1476919210",
"first":12,
"after":"QVFBdzl5NGZscTZzUzh0R21jbGJENUJRYVkya0..."
}
}
Analyzing the parameters of this request, I was intrigued by query_hash
, I didn’t find any pattern, much less where it comes from. After a little research, I found out that this parameter is responsible for setting the basis on the content of the query and changes over time. And, according to the answer to that stackoverflow question:
The persisted query is used to improve GraphQL network performance by reducing the request size.
And it makes sense, not least because GraphQL is one of the technologies maintained by Facebook and consequently used on Instagram. Continuing my research I found that e quer_hash
can be replaced by query_id
. This parameter makes a lot more sense, as it has a much more defined pattern. Below is a table with some values that this parameter accepts:
Note that we are concerned with bringing the posts that are in the feed.
Continuing, we will analyze another parameter that:
"variables":{
"id":"1476919210",
"first":12,
"after":"QVFBdzl5NGZscTZzUzh0R21jbGJENUJRYVkya0..."
}
In this variables
object we have the id
and the after
both are present in the return of the previous request:
after
:graphql.user.edge_owner_to_timeline.page_info.end_cursor
id
:graphql.user.id
So we can use route:
https://www.instagram.com/graphql/query/?query_id=17842794232208280&variables=%7B%22id%22%3A%22<graphql.user.id>%22%2C%22first%22%3A12%2C%22after%22%3A%22<graphql.user.edge_owner_to_timeline.page_info.end_cursor>%22%7D
That will return us:
{
"data":{
"user":{
"edge_owner_to_timeline_media":{
"count":140,
"page_info":{
"has_next_page":true,
"end_cursor":"..."
},
"edges":[...]
}
}
},
"status":"ok"
}
The only thing that changes in the return JSON structure is the external object before everything was in grapfql
and is now data
.
Sensational! Note that, without any authentication method, we developed a way to pull all posts from a public instagram profile!
If you are curious how to use these “apis” in practice you can access my portfolio and see how I am listing my posts in the hobbies tab: https://carlos-henreis.github.io/
In the next article I hope to make a structure (in terms of UX / UI Design) using these APIs that was presented in this article. In this case, I will implement using VueJS and Vuetify.