🕸️
JsCrack
  • README
  • 🌱AST
    • JS代码混淆基础
    • AST原理与实现
    • Babel API
    • AST自动化混淆JS
    • AST自动化还原JS
  • 🛠️Tricks
    • 经验之谈
    • 解密定位
    • Cookie加密
    • WebPack混淆
    • 进制流解密
    • RPC调用
    • TLS握手流程
    • JAR3算法
  • 🎬Slider
    • 极验滑块JS逆向
  • 🍖Practice
    • 某查查爬取统一社会信用代码
    • 某安全社区文章爬取
Powered by GitBook
On this page

Was this helpful?

  1. 🍖Practice

某安全社区文章爬取

Previous某查查爬取统一社会信用代码

Last updated 1 year ago

Was this helpful?

TimeStamp:2024-5-22

以该URL为例:https://xz.aliyun.com/t/7450

响应头Set-Cookie: acw_tc=707c9fc717163581948566458e262b1c41a7d271e4bef8e17a9aa7ec55c582;

响应体一堆js,一眼阿里系cookie反爬

紧接着发起了第二个数据包,这次的响应体就是文章的明文了。

GET /t/7450?time__1311=n4%2BxnD0G0%3Dit0QDkID%2FiWR%2BesG8DBinD%2B7YD HTTP/1.1
Host: xz.aliyun.com
Cookie: acw_tc=707c9fc717163581948566458e262b1c41a7d271e4bef8e17a9aa7ec55c582; tfstk=fJAx9ZNLF0mmHGDMGEMkbdqdtogotILqon8QsGj0C3KJuEXsuhvm15tRqIjDos_62w_QhGO975Q65huVjqkkuE5N1DmH6XY2u-YGQ4OAlzNWzNb_CUR1jn5N1DmotXY2u1-h36IjBUgRSNebch6fVTsPRZ11cOZ7Fws5f116fg15Sww15GwbFaGDirIglGF9ZRyPUOruvJ2HwaiNAEnu6aAPkxjpognIDQE1HMTf25ih14vAXgRIqYLyfKtczhhK269e2CBC6jEFkLTdNifIp8CvrnRdGBnYiaXODLOfePN6REJcCdKxD-SXn3BFPTatTaxhVERXeVrWlHjAGa6oO2L5CLAVLIm7M69eosv6A02ccpQ14EdH9610xMQgh438QRWfrZOebi-3NHENyM0uERyNEa7Rx438QRWfzaInrb2aQTbP.

主要两个变化:

  • Cookie多了acw_tc和tfstk两个字段

  • 请求参数多了一个time__1311

开始分析第一个数据包返回的js,混淆程度还挺高的,几乎没有可读性

https://obf-io.deobfuscate.io/尝试初步解混淆,但貌似没啥效果。。。

设置XHR断点没断住,说明不是通过XHR发送第二个数据包,猜测是通过location.href来跳转页面

看到js最后一行有个可疑的document,果然。

可知_0x48736b这个函数是用来获取URL对象的字符串表示

接着搜search,无果

再从调用栈往上回溯

翻译一下,这里把host的每个字符的ascci相加

var _0x420dc7 = 0x0;
for (let index = 0; index < host.length; index++) {
    _0x420dc7 += host['charCodeAt'](index);
}

接着有个对象存储了不同类型的计算方法,是根据上面算的_0x420dc7来获取的

_0x3a7bdd = ['type__', 'refer__', 'ipcity__', 'md5__', 'decode__', 'encode__', 'time__', 'timestamp__', 'type__']
_0x3e621b = _0x3a7bdd[9 % _0x420dc7] + _0x420dc7 % 10000

到此我们可以验证一下上面的请求参数

let sum = 0x0;
let host = 'xz.aliyun.com';
for (let index = 0; index < host.length; index++) {
    sum += host['charCodeAt'](index);
}
let types = ['type__', 'refer__', 'ipcity__', 'md5__', 'decode__', 'encode__', 'time__', 'timestamp__', 'type__'];
let param = types[sum % 9] + sum % 10000
console.log(param);  // time__1311

即对于这个hostname,请求参数固定是time__1311

再往下看值是如何生成的,翻译过来是这样的

var _0x318558 = 0x0;
var _0x465be6, _0x44becc;
var url = 'https://xz.aliyun.com/t/7450';
for (_0x465be6 = 0x0; _0x465be6 < url.length; _0x465be6++) {
    _0x44becc = url.charCodeAt(_0x465be6);
    _0x318558 = ((_0x318558 << 0x7) - _0x318558) + 0x18e + _0x44becc;
    _0x318558 |= 0x0;
}
console.log(_0x318558) // -1400793666

接着把下面的_0x30f62c赋值给time__1311

_0x30f62c = _0x56d97c(_0x318558 + '|' + 0 + '|' + (new Date().getTime() + ''));
// n4+xnD0G0=it0QDkID/iWR+epdDt=DReK6YD

这个_0x56d97c函数特别长,不好翻译,直接扣代码

测试发现不带cookie也能获取到文章,只需要time__1311

POC:

import execjs
import requests

with open('encrypt.js', 'r') as file:
    js_code = file.read()

ctx = execjs.compile(js_code)

url = 'https://xz.aliyun.com/t/7450'
seed = 0
for i in range(len(url)):
    seed = (((seed << 7) - seed) + 0x18e + ord(url[i])) & 0xFFFFFFFF
    if (seed & (1 << 31)) != 0:
        seed = seed - (1 << 32)
    seed = seed | 0

data = {
    'time__1311': ctx.call('encrypt', str(seed))
}
r = requests.get(url, params=data)
print(r.text)

成功!

encrypt.js👉

Click Me
image-20240522145510341
image-20240522145811691
image-20240522153448347
image-20240522153945629
image-20240522171944830
image-20240522172040163