Get device type (phone/tablet/other) by brand name
I suggest a simpler approach. Whenever a device is used for wireless communication it has to be certified. In the US - that's the FCC.
They have an API:
https://data.fcc.gov:443/api/accessibilityclearinghouse/product/searchProducts?api_key=23232323&format=json&rowPerPage=20&searchString=galaxy%20s4
This gladly returns:
"maker": "Samsung",
See it here: https://ach.fcc.gov/for-developers//#!/API/product_searchProducts_get
You can also query apis like eBay and Amazon.
The following approach should work, but will require some programming:
- Create groups of synonyms for each device type name you are trying to categorize (e.g. [phone; cellphone], [tablet; pad])
- Use Google Search REST API to get search results for your device name (more specialized, internet retailers API can be used instead)
- Use regular expressions to count the number of matches in search results against each synonym in a particular group
- Group with highest total match count for all synonyms represents your device type
- If no matches found, device type should be classified as 'other'. To prevent false positive matching as one of target groups, minimum number of matches can be set to be sure that 'other' device is not put to 'tablets' or 'phones' by mistake. I assume that regexp check will be performed against multiple search result items at once
The main pros are that your results will be always up-to-date and supported by the best search engine in the world.
As for cons, if you will use Google API for free they will limit allowed number of requests per day (can be increased for fee). Also some moderation may be required for 'other' devices to make sure your classification program works correctly
The approach potential can roughly be estimated before development by just entering sample device names in google and looking at search results. If they contain 'missing' device types, then it is worth trying to implement this.
Since Google API has strict rate limits and restrictions for commercial use, you may consider using other search engines instead, e.g. Yahoo, which allows commercial use if you notify them and they have high rate limits in this case.
I am using http://www.handsetdetection.com/ API paid version, which gives accurate results. They have free trail version option for testing.
$referer_site = $_SERVER['HTTP_REFERER'];
$useragent = $_SERVER['HTTP_USER_AGENT']; //"NokiaN95";//
$curlOpts = array(
CURLOPT_URL => "http://api.handsetdetection.com/apiv3/site/detect/xxxxx.json",
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HTTPAUTH => CURLAUTH_DIGEST,
CURLOPT_USERPWD => 'xxxxxxxx:xxxxxxxxx',
CURLOPT_HTTPHEADER => array('Content-Type:application/json'),
CURLOPT_POSTFIELDS => '{"user-agent":"'.$useragent.'"}');
/******************************************/
$curl = curl_init();
curl_setopt_array($curl, $curlOpts);
$responseBody = curl_exec($curl);
$jsonObj = json_decode($responseBody);
curl_close($curl);
$device_details = json_decode($responseBody);
After searching and googling for a while, I came across a site called GSMArena. Now the site is a comparison site for phones and tablets, you can basically see all specs of a tablet/phone. Looking at the source code of the search page I found that there was a div with the class "maker" that contained all the search results.
Also once you click on a phone/tablet link it takes you to a page with the title "tablet name - Full tablet specs" if its a tab, and "mobile name - Full phone specs" if it's a mobile.
If it found a direct match to a search query it redirected straight to the spec page, so i added a If test to check if it is the search page or spec page.
My program gets the first link in the "maker" div (using BeautifulSoup) and then goes to the link, pulls out html, and then gets the title of the page.
If no results are found, my program marks it as "Others"
Code:
import urllib.request
from bs4 import BeautifulSoup
searchlist = ["galaxy note","nexus 10","nexus 5","galaxy ace","moto g","galaxy tab 2", "MID-97D"]
for searchstr in searchlist:
other = False
searchstr = searchstr.replace(" ", "%20")
searchlink = "http://www.gsmarena.com/results.php3?sQuickSearch=yes&sName="+searchstr
string = urllib.request.urlopen(searchlink).read().decode("ISO-8859-1")
soup = BeautifulSoup(string,"lxml")
if soup.title.string == "Phone Finder results - GSMArena.com":
makerdiv = soup.find_all('div', attrs={'class': 'makers'})
links = makerdiv[0].find_all('a')
if len(links) != 0:
link = "http://www.gsmarena.com/" + links[0].attrs['href']
string = urllib.request.urlopen(link).read().decode("ISO-8859-1")
soup = BeautifulSoup(string,"lxml")
else:
other = True
if other == False:
title = soup.title.string
name = title.split("-")[0]
rest = title.split("-")[1]
taborphone = rest.split(" ")[2]
else:
name = searchstr
taborphone = "other"
print("Name:",name)
print("Type:",taborphone)
Output:
Name: Samsung Galaxy Note5
Type: phone
Name: Samsung Google Nexus 10 P8110
Type: tablet
Name: LG Nexus 5X
Type: phone
Name: Samsung Galaxy Ace 3
Type: phone
Name: Motorola Moto G (3rd gen)
Type: phone
Name: Samsung Galaxy Tab 2 7.0 P3100
Type: tablet
Name: MID-97D
Type: other
And it works :)
Pros:
The database will always stay updated, and GSMArena has lots of phones and tablets in its database
Cons:
It cant be used for devices other than tablet and mobiles, like netbooks etc.
I just noticed that @Oli has suggested GSMArena in the comments.