Demystifying Illegal Mobile Gambling Apps
introduce
This is a paper "Demystifying Illegal Mobile Gambling Apps" from WWW 2021. The authors are: Yuhao Gao, Haoyu Wang, Li Li, Xiapu Luo, Guoai Xu, Xuanzhe Liu.
This paper is about the measurement of illegal gambling apps in China
Pre-knowledge
SAN is a network-centric storage structure. Different from ordinary Ethernet, SAN is located at the back end of the server and is a high-performance dedicated network established for connecting storage devices such as servers, disk arrays, and tape libraries. In SAN, it includes various elements, such as adapters, disk arrays, switches, etc., so it is a system rather than an independent device.
Autonomous system: autonomous system. In the Internet, an autonomous system (AS) is a small unit that has the right to autonomously decide which routing protocol should be used in the system. This network unit can be a simple network or a network group controlled by one or more ordinary network administrators. It is a single manageable network unit (such as a university, an enterprise or a company) individual). An autonomous system is sometimes called a routing domain. An autonomous system will assign a globally unique number, sometimes we call this number the autonomous system number (ASN).
CNAME record, that is: alias record. This type of record allows you to map multiple names to the same computer. Usually used for computers that provide both WWW and MAIL services. For example, there is a computer named "host.mydomain.com" (A record). It provides WWW and MAIL services at the same time, in order to facilitate users to access services. Two aliases (CNAME) can be set for this computer: WWW and MAIL.
Dataset source
Identify illegal online gambling sites first and let collect illegal gambling apps according to the sites.
Online gambling website: Cooperate with a major Chinese ISP to obtain the DNS request data of all users in a major city from August 2019 to January 2020, and retrieve it from the data set by keyword.
Illegal Gambling Apps: Use semi-automated methods to obtain gambling apps from gambling sites. Some sites offer direct download methods, while others redirect to hidden sites.
measuring angle
1. Measurement of real-world prevalence of illegal gambling sites
1) Relationship measurement by domain name and application:
Gambling application (blue), gambling website (green), category 1 download service means that the download service has the same domain name as the gambling website (red) and category 2 download service has a different gambling website domain name (orange). The edge between gambling sites and category 2 download services indicates that gambling sites use download services to distribute gambling apps. An edge between a download service (type 1 or type 2) and a gambling application indicates that the gambling application was downloaded from that service.
2) Top 10 gambling domain names
3) Abuse of third-party application service channels
Using a headless browser to grab the download website and perform manual verification, it is found that some download services are common services, and these services are abused by some gambling applications to provide download channels.
2. Characteristics of Gambling Apps
1) Website structure
Angular 1: Connected server address
Method: First use DroidBot to dynamically train and identify the UI interface of the gambling application, then run the gambling software on the real mobile phone to find the server address connected to the mobile phone, and finally use TCPdump to distinguish the network traffic, filter the public server, and then analyze the connected server. When filtering public servers, the method adopted is to obtain public domain names through the dynamic exploration and collection of large-scale Android applications and use Alexa's top 10,000 domain names to filter domain names collected from gambling software.
Gambling apps were found to usually connect to many different server addresses (8 on average). Gambling apps have a more complex communication process
The gambling app com.hmobile.core will first connect to hjcxapix.com during app initialization and get a list of app settings. When a user wants to log in, it requests the real login function URL w1.vip66888.com from www.hjcvip.net:844, and sends it the account information. After successful login, a list of gambling games will be returned from w1.vip66888.com. The user can then select a game and the app will request the main game URL gci.hjcvg.com and its resource loading URL gc.vpcdn.com from www.hjcvip.net:844.
Angle 2: Domain Analysis
Distribution of Top Level Domains in Gambling Applications
ASNs of Top 10 Gambling Application Servers
Method: Use Qihoo 360 and VirusTotal to collect relevant IP addresses, and then use the IP-To-ASN mapping table to obtain the email addresses of the top
10 application server registrars
of the ASN Top 10 application server registrants
2) Malicious behavior
Method: Use VirusTotal to identify malicious applications, and then use AVClass to identify malware families, 56% of gambling software has malicious behavior
Top 10 Malicious Gambling App Ranking
3) Abuse of third-party services
Top 10 abuse of third-party domain names/third-party libraries/CNAMEs.
4) Payment service
The process of third-party payment and fourth-party payment
Manual analysis of 10 gambling applications
Payment transaction platform
3. Infer the underground activities behind illegal gambling applications, so as to discover more such applications
1) Application cluster
The author finds that many gambling software have the same UI structure, and proposes a cluster analysis of gambling applications based on code-level similarity and developer signatures to analyze the relationship between gambling applications.
Method: Using FSquaDRA2, an application clone detection tool based on code similarity and resource similarity, to make a pairwise comparison between all applications
Cluster of Top 10 Gambling Apps. #APP indicates the number of gambling applications in this cluster, #Cert indicates the number of developer certificates used in this cluster, %Top-1 Cert indicates the proportion of the most popular gambling application signing certificates, #Prefix indicates different types The number of package name prefixes, %Top-1 Prefix indicates the most popular package prefix sharing gambling applications.
2) Identify new gambling apps
Inference over HTTPS
Method: Use VirsTotal to collect the latest certificate information for all gambling domain names, then extract the SAN data in the certificates, and identify relevant domain names by analyzing these data. A total of 53,749 websites were successfully accessed through this method. Afterwards, 1,000 websites were selected for manual verification, and 961 domains (96.1%) were found to be gambling websites. Then, a feature provided by virtotal was used to trace the applications communicating with these gambling servers. In this way, we identified a total of 16,973 applications. Then, 1000 were downloaded from virtotal for manual verification. 879 of them were actually gambling apps.
Inference based on application developer's signature
Method: Collect signatures from gambling software, and then use the signatures to search for published applications in Koodous to find APPs signed by relevant developers