爬取鲜花网站数据

news/2025/2/2 21:13:29 标签: 爬虫

待爬取网页:
在这里插入图片描述
代码:

import requests

from lxml import  etree
import pandas as pd

from lxml import html
import xlwt

url = "https://www.haohua.com/xianhua/"

header = {
    "accept":"image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8",
    "accept-encoding":"gzip, deflate, br, zstd",
    "accept-language":"zh-CN,zh;q=0.9",
    "cookie":"MUID=35169CD2EDEA6D7E149B88BEECB06C7B; SRCHD=AF=NOFORM; SRCHUID=V=2&GUID=06DEDF3E60F3437B9D1E0E0541286638&dmnchg=1; MUIDB=35169CD2EDEA6D7E149B88BEECB06C7B; MMCASM=ID=5709703A12A449E3A5153FAA872F0450; _UR=QS=0&TQS=0&Pn=1; _TTSS_IN=hist=WyJ6aC1IYW5zIiwiZW4iLCJhdXRvLWRldGVjdCJd&isADRU=0; _TTSS_OUT=hist=WyJlbiIsInpoLUhhbnMiXQ==; _tarLang=default=zh-Hans&newFeature=tonetranslation; _EDGE_S=SID=10AB24CBE0666F783D443148E1B46E27; _Rwho=u=d&ts=2025-01-29; _SS=SID=10AB24CBE0666F783D443148E1B46E27&R=200&RB=0&GB=0&RG=200&RP=200&PC=U316; SRCHUSR=DOB=20240521&T=1738198155000&TPC=1736825154000; USRLOC=HS=1&ELOC=LAT=31.554468154907227|LON=117.24475860595703|N=%E8%82%A5%E8%A5%BF%E5%8E%BF%EF%BC%8C%E5%AE%89%E5%BE%BD%E7%9C%81|ELT=4|; SNRHOP=I=&TS=; _HPVN=CS=eyJQbiI6eyJDbiI6ODksIlN0IjoxLCJRcyI6MCwiUHJvZCI6IlAifSwiU2MiOnsiQ24iOjg5LCJTdCI6MCwiUXMiOjAsIlByb2QiOiJIIn0sIlF6Ijp7IkNuIjo4OSwiU3QiOjAsIlFzIjowLCJQcm9kIjoiVCJ9LCJBcCI6dHJ1ZSwiTXV0ZSI6dHJ1ZSwiTGFkIjoiMjAyNS0wMS0zMFQwMDowMDowMFoiLCJJb3RkIjowLCJHd2IiOjAsIlRucyI6MCwiRGZ0IjpudWxsLCJNdnMiOjAsIkZsdCI6MCwiSW1wIjo2MDgsIlRvYm4iOjB9; _RwBf=r=0&ilt=835&ihpd=0&ispd=8&rc=200&rb=0&gb=0&rg=200&pc=200&mtu=0&rbb=0&g=0&cid=&clo=0&v=15&l=2025-01-29T08:00:00.0000000Z&lft=2025-01-13T00:00:00.0000000-08:00&aof=0&ard=0001-01-01T00:00:00.0000000&rwdbt=0&rwflt=0&o=2&p=&c=&t=0&s=0001-01-01T00:00:00.0000000+00:00&ts=2025-01-30T01:37:12.0686804+00:00&rwred=0&wls=&wlb=&wle=&ccp=&cpt=&lka=0&lkt=0&aad=0&TH=&rwaul2=0; SRCHHPGUSR=SRCHLANG=zh-Hans&BRW=XW&BRH=S&CW=1495&CH=217&SCW=1479&SCH=217&DPR=1.5&UTC=480&DM=0&WTS=63873794963&PRVCW=1494&PRVCH=765&PV=15.0.0&HV=1738201032&BZA=0&WEBTHEME=0&THEME=0&EXLTT=31&AV=14&ADV=14&RB=0&MB=0",
    "ect":"4g",
    "priority":"i",
    "referer":"https://cn.bing.com/chrome/newtab",
    "sec-ch-ua":'"Not A(Brand";v="8", "Chromium";v="132", "Google Chrome";v="132"',
    "sec-ch-ua-arch":"x86",
    "sec-ch-ua-bitness":"64",
    "sec-ch-ua-full-version":"132.0.6834.111",
    "sec-ch-ua-full-version-list":'"Not A(Brand";v="8.0.0.0", "Chromium";v="132.0.6834.111", "Google Chrome";v="132.0.6834.111"',
    "sec-ch-ua-mobile":"?0",
    "sec-ch-ua-model":"",
    "sec-ch-ua-platform":"Windows",
    "sec-ch-ua-platform-version":"15.0.0",
    "sec-fetch-dest":"image",
    "sec-fetch-mode":"no-cors",
    "sec-fetch-site":"same-origin",
    "user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36"
}
response = requests.get(url = url,headers = header)

response.encoding = "utf-8"
# print(response.text)


# price = tree.xpath('//a[@class="info imghover"]/p[@class="price b"]/span[not(@class)]/text()')
#
# print(price[0].strip())


html = etree.HTML(response.text)

# print(html)

fresh_flowers = []
popularity = []
original_price = []
now_price = []

name = html.xpath('//a[@class = "info imghover"]/h5')

for i in name:
    fresh_flowers.append(i.text)

# for i in xianhua_name:
#     print(i)

price = html.xpath('//a[@class = "info imghover"]/p')

for i in price:
    original_price.append(i[1].text)
    popularity.append(i[2].text)


datalist = []
datalist.append(fresh_flowers)
datalist.append(original_price)
datalist.append(popularity)




# 将数据组织成字典
data = {
    "fresh_flowers": fresh_flowers,
    "original_price": original_price,
    "popularity": popularity
}

# 创建DataFrame
df = pd.DataFrame(data)

# 将DataFrame写入Excel文件
df.to_excel("xianhua_data.xlsx", index=False)

print("数据已成功写入Excel文件")



# print(len(xianhua_name))
# print(len(original_price))
# print(len(popularity))





结果文件:
在这里插入图片描述


http://www.niftyadmin.cn/n/5840301.html

相关文章

Flutter常用Widget小部件

小部件Widget是一个类,按照继承方式,分为无状态的StatelessWidget和有状态的StatefulWidget。 这里先创建一个简单的无状态的Text小部件。 Text文本Widget 文件:lib/app/app.dart。 import package:flutter/material.dart;class App exte…

[Java基础]开发工具Idea

安装工具 IDE: 称为集成开发环境, 把代码编写,编译,执行等功能综合在一起的工具 卸载 控制面板->卸载程序->卸载->勾选清空配置->确认卸载 下载/安装 官网下载: IntelliJ IDEA – the Leading Java and Kotlin IDE 默认安装: 旗舰版安装无需任何勾选, 傻瓜安装…

数据库内存与Buffer Pool

数据库内存与Buffer Pool 文章目录 数据库内存与Buffer Pool一:MySQL内存结构1:MySQL工作组件2:工作线程的本地内存3:共享内存区域4:存储引擎缓冲区 二:InnoDB的核心:Buffer Pool1:数…

pytorch实现长短期记忆网络 (LSTM)

人工智能例子汇总:AI常见的算法和例子-CSDN博客 LSTM 通过 记忆单元(cell) 和 三个门控机制(遗忘门、输入门、输出门)来控制信息流: 记忆单元(Cell State) 负责存储长期信息&…

蓝桥杯python语言基础(1)——编程基础

目录 一、python开发环境 二、python输入输出 (1)print输出函数 print(*object,sep,end\n,......) (2)input输入函数 input([prompt]), 输入的变量均为str字符串类型! input()会读入一整行的信息 ​编…

Scratch 《像素战场》系列综合游戏:像素战场游戏Ⅰ~Ⅲ 介绍

资源下载 Scratch《像素战场》系列综合游戏合集:像素战场游戏Ⅰ~Ⅲ压缩包 https://download.csdn.net/download/leyang0910/90332765 游戏操作介绍 Scratch 《像素战场Ⅰ》操作规则: 这是一款与朋友一起玩的 1v1 游戏。先赢得6轮胜利! WA…

如何编写一个MyBatis插件?

大家好,我是锋哥。今天分享关于【Redis为什么这么快?】面试题。希望对大家有帮助; 如何编写一个MyBatis插件? 1000道 互联网大厂Java工程师 精选面试题-Java资源分享网 编写 MyBatis 插件需要使用 MyBatis 提供的插件接口,MyBa…

CSS 基础:层叠、优先级与继承

CSS 基础:层叠、优先级与继承 一、层叠(Cascade)示例:层叠的顺序 二、优先级(Specificity)优先级规则示例:优先级的比较 三、继承(Inheritance)哪些属性会被继承&#xf…