Journey of Imagination: My original blog is completely handcrafted, absolutely not ported, and there is no possibility of repetition on the entire network; I have no team, and I only share it for technology enthusiasts, and all content does not involve advertisements. All my articles are only published on CSDN, Nuggets and personal blog (must be the domain name of Fantastic Journey), otherwise all are pirated articles!
related information:
- Decryption of WeChat PC-side database files_Decryption of WeChat PC version db files_Imagination Journey's Blog-CSDN Blog
- A brief description of the structure and function of each database file on the PC side of WeChat - root directory_Which databases are used in WeChat_奇想游博客-CSDN博客
Multi
The decoding of files in folders is the same as before for other database operations.
The file structure in this folder is relatively simple, there are only three types: FTSMSG
, MediaMSG
and MSG
. It is said that there are three types here, not three, because the database here will be split when it reaches a certain size.
FTSMSG
Those who have read the "Overview" article should be familiar with the prefix FTS-this represents the index required for searching.
The main contents are the following two tables:
- FTSChatMsg2_content: There are three fields inside
- docid: number incremented from 1, equivalent to the ID of the current entry
- c0content: Search keywords (keywords entered in the WeChat search box can be searched for by this field)
- c1entityId: The purpose is not clear yet, it may be related to verification
- FTSChatMsg2_MetaData
- docid:
FTSChatMsg2_content
corresponds to the docid in the table - msgId:
MSG
corresponds to the content in the database - entityId:
FTSChatMsg2_content
corresponds to c1entityId in the table - type: the possible type of the message
- The rest of the fields are unclear
- docid:
In particular, the number 2 in the table name, my personal guess may be the version number of the current database format.
MediaMSG
All voice messages are stored here. There is one and only Media
one table in the database, which contains three valid fields:
- Key
- Reserved0
- Buf
The Reserved0
fields correspond one-to-one to MSG
the messages in the databaseMsgSvrID
.
The third item is the binary data of the voice. You can find that these files are stored in the SILK format by observing the header. This is a voice format developed and open sourced by Microsoft for Skype, you can Google it yourself.
Here is the code to export the data in the Buf field to a file:
import sqlite3
def writeTofile(data, filename):
with open(filename, 'wb') as file:
file.write(data)
print("Stored blob data into: ", filename, "\n")
def readBlobData(key):
try:
sqliteConnection = sqlite3.connect('dbs/decoded_MediaMSG0.db')
cursor = sqliteConnection.cursor()
print("Connected to SQLite")
sql_fetch_blob_query = """SELECT * from Media where Key = ?"""
cursor.execute(sql_fetch_blob_query, (key, ))
record = cursor.fetchall()
for row in record:
print("Key = ", row[0], "Reserved0 = ", row[1])
file = row[2]
print("Storing on disk \n")
path = f'{
row[0]}.silk'
writeTofile(file, path)
cursor.close()
except sqlite3.Error as error:
print("Failed to read blob data from sqlite table", error)
finally:
if sqliteConnection:
sqliteConnection.close()
print("sqlite connection is closed")
readBlobData(1099511630953)
If you need to find files through MSG
the database MsgSvrID
, you can change the SQL query and then traverse all the databases.
The following is the code to convert the silk file to wav (the implementation idea is to convert it to pcm first and then to wav; the sampling rate data of wav is personally tested):
KEY = 1099511630953
import wave
from pathlib import Path
import pilk
def pcm2wav(pcm_file, wav_file, channels=1, bits=16, sample_rate=24000):
pcmf = open(pcm_file, 'rb')
pcmdata = pcmf.read()
pcmf.close()
if bits % 8 != 0:
raise ValueError("bits % 8 must == 0. now bits:" + str(bits))
wavfile = wave.open(wav_file, 'wb')
wavfile.setnchannels(channels)
wavfile.setsampwidth(bits // 8)
wavfile.setframerate(sample_rate)
wavfile.writeframes(pcmdata)
wavfile.close()
duration = pilk.decode(f"{
KEY}.silk", f"{
KEY}.pcm")
# print("语音时间为:", duration)
Path(f"{
KEY}.silk").unlink()
pcm2wav(f"{
KEY}.pcm", f"{
KEY}.wav")
Path(f"{
KEY}.pcm").unlink()
These two codes are not explained in detail, so read it yourself.
MSG
Finally arrived at the entire file, no, the most important part of the entire project - the core database of chat records !
The two main tables inside are MSG
and Name2ID
.
Among them Name2ID
, this table has only one column, and the content format is 微信号
or 群聊ID@chatroom
, and the function is to make MSG
some fields in it correspond to it. Although there is no ID column in the table, in fact WeChat defaults to the ID of the first row (numbered from 1).
The following is mainly about MSG
this table (the bold is used to remind yourself that the content needs to be supplemented, not important information):
- localId: literally means the local ID of the message, its function has not been found yet
- TalkerId: ID of the room where the message is located (this information is a guess, see the StrTalker field for the reason of the guess), corresponding to
Name2ID
. - MsgSvrID: Guess that Srv may be the abbreviation of Server, which refers to the message ID stored on the server side
- Type: message type, see Table 1 for details
- SubType: message type subcategory, its actual use has not been seen yet
- IsSender: Whether it is a message sent by yourself, that is, the marked message is displayed on the left or right of the conversation page, and the value is 0 or 1
- CreateTime: The second-level timestamp of the message creation time. Further experiments are needed here to confirm which time node is specifically marked at this time . The rules of personal guessing are as follows:
- Messages sent from this computer: markers represent the moment the send button was clicked for each message
- Messages sent/received from other users from other devices: mark the time when the message was received locally from the server
- Sequence: Sequence, although it looks like a millisecond timestamp but it is not. This is composed of three digits at the end of the CreateTime field, usually 000, if two messages with the same CreateTime appear, the last three digits will increase in turn. Further confirmation is required whether the unique range is within one session or all sessions .
- StatusEx, FlagEx, Status, MsgServerSeq, MsgSequence: These five fields have not analyzed valid information for the time being
- StrTalker: The WeChat account of the message sender. In particular, from this point of view, the TalkerId field above most likely refers to the room ID where the message is located, not the sender ID. Of course, it may also be the same content as the TalkerId, which needs to be confirmed .
- StrContent: data in string format. In particular, except for text-type messages, most other types of this field will be a piece of XML data to mark some relevant information.
- DisplayContent: For taking a picture, save the account information of the shooter and the person being photographed
- Reserved0~6: These fields have not yet analyzed valid information, and some fields are always empty
- CompressContent: It literally means compressed data, in fact, the data in StrContent that Micro-Trust does not want to exist here (for example, text messages with references, etc.; the messages here can only be distinguished according to the binary content, but the specific format specification , I don’t know how to retrieve the data)
- BytesExtra: extra data in binary format
- BytesTrans: At present, this is a field that is always empty
There are quite a lot of guesses here, and there are still a lot of things marked that should be further tested that have not been completed, because it is impossible to update the database in real time with newly received messages after unlocking, and every time a new message is sent, I don’t know it. Which database will appear in the split, so the experimental efficiency is extremely low.
Table 1: MSG.Type
Comparison table of field values and meanings (may be extended to fields that also mark message type information in other databases)
Classification | subcategory | corresponding type |
---|---|---|
1 | 0 | text |
3 | 0 | picture |
34 | 0 | voice |
43 | 0 | video |
47 | 0 | Animated emoticons (emoticons developed by third parties) |
49 | 1 | Messages that are similar to text messages but not the same, so far I have only seen an invitation to register on Alibaba Cloud Disk. Estimated to be the same as the case of 57 subclasses |
49 | 5 | Card-style link, title, introduction, etc. in CompressContent, and locally cached cover path in BytesExtra |
49 | 6 | File, there is a file name and download link in CompressContent (but not read), and there is a path saved locally in BytesExtra |
49 | 8 | For the GIF expression uploaded by the user, there is a CDN link in CompressContent, but it seems that the download cannot be accessed directly |
49 | 19 | Merged and forwarded chat records, detailed chat records in CompressContent, and caches of pictures, videos, etc. in BytesExtra |
49 | 33/36 | For the shared applet, there is card information in CompressContent, and the cover cache location in BytesExtra |
49 | 57 | Text message with quotes (StrContent is empty for this type, and both sent and quoted contents are in CompressContent) |
49 | 63 | Video number live broadcast or live playback, etc. |
49 | 87 | Group announcement |
49 | 88 | Video number live broadcast or live playback, etc. |
49 | 2000 | Transfer messages (including sending, receiving, and voluntary refunds) |
49 | 2003 | Gift red envelope cover |
10000 | 0 | System notifications (the kind of gray text that appears in the center) |
10000 | 4 | take a pat |
10000 | 8000 | System notifications (especially if you invite someone to a group chat) |
References for this article (in no particular order):