quer ajudar? Aqui estão as suas opções:","Crunchbase","Sobre nós","Obrigado a todos pelo fantástico apoio!","Ligações rápidas","Programa de afiliados","Prémio","ProxyScrape ensaio premium","Tipos de proxy","Países substitutos","Casos de utilização de proxy","Principais fornecedores de proxy residencial","Importante","Política de cookies","Declaração de exoneração de responsabilidade","Política de privacidade","Termos e condições","Representantes éticos","Redes sociais","Facebook","LinkedIn","Twitter","Telegrama","Discórdia","\n © Copyright 2025 - Thib BV | Brugstraat 18 | 2812 Mechelen | Bélgica | VAT BE 0749 716 760\n"]}
A raspagem da Web tornou-se uma ferramenta indispensável para recolher dados de toda a Internet, permitindo que analistas de dados, entusiastas da tecnologia e empresas tomem decisões informadas. Mas a extração de dados é apenas o primeiro passo. Para desbloquear todo o seu potencial, é necessário exportá-los eficientemente para o formato correto - quer se trate de um ficheiro CSV para folhas de cálculo, JSON paraAPIs ou bases de dados para armazenamento e análise em grande escala.
Este blogue irá guiá-lo pelos aspectos essenciais da exportação de dados extraídos da Web. Aprenderá passo a passo a trabalhar com ficheiros CSV e JSON, a integrar dados extraídos da Web com bases de dados e a tirar o máximo partido das suas práticas de gestão de dados.
Before diving into the script, let’s understand the dataset and workflow that we’ll use to demonstrate the data-saving process.
We’ll be scraping data from the website Books to Scrape, which provides a list of books along with their:
This website is designed for practice purposes, making it an ideal choice for showcasing web scraping techniques.
Here’s the process we’ll follow:
requests
e BeautifulSoup
libraries to extract the book details from the website.To run the script, you’ll need the following Python libraries:
Install these libraries using
. Run the following command in your terminal: pip
pip install requests beautifulsoup4 pandas
Here’s the Python script to scrape the data from the website and store it in a Pandas DataFrame:
import requests
from bs4 import BeautifulSoup
import pandas as pd
# Scrape data from the website
def scrape_books():
url = "https://books.toscrape.com/"
response = requests.get(url)
if response.status_code != 200:
raise Exception("Failed to load page")
soup = BeautifulSoup(response.content, "html.parser")
books = []
# Extract book data
for article in soup.find_all("article", class_="product_pod"):
title = article.h3.a["title"]
price = article.find("p", class_="price_color").text.strip()
availability = article.find("p", class_="instock availability").text.strip()
books.append({"Title": title, "Price": price, "Availability": availability})
# Convert to DataFrame
books_df = pd.DataFrame(books)
return books_df
# Main execution
if __name__ == "__main__":
print("Scraping data...")
books_df = scrape_books()
print("Data scraped successfully!")
print(books_df)
The table we will use to demonstrate the data-saving process is structured as follows:
Title | Preço | Disponibilidade |
A Light in the Attic | £51.77 | In stock |
Tipping the Velvet | £53.74 | In stock |
Soumission | £50.10 | In stock |
Sharp Objects | £47.82 | In stock |
Sapiens: A Brief History of Humankind | £54.23 | NA |
The Requiem Red | £22.65 | In stock |
... | ... | .... |
Use the
method from Pandas: to_csv
def save_to_csv(dataframe, filename="books.csv"):
dataframe.to_csv(filename, index=False)
print(f"Data saved to {filename}")
Code Explanation:
filename
: Specifies the name of the output file.index=False
: Ensures the index column is not included in the CSV file. Use the
method from Pandas: to_json
def save_to_json(dataframe, filename="books.json"):
dataframe.to_json(filename, orient="records", indent=4)
print(f"Data saved to {filename}")
Code Explanation:
orient="records"
: : Each row in the DataFrame is converted into a JSON object.indent=4
: Formats the JSON for better readability. Use the
method from Pandas com SQLite: to_sql
import sqlite3
def save_to_database(dataframe, database_name="books.db"):
conn = sqlite3.connect(database_name)
dataframe.to_sql("books", conn, if_exists="replace", index=False)
conn.close()
print(f"Data saved to {database_name} database")
Code Explanation:
sqlite3.connect(database_name)
: Connects to the SQLite database (creates it if it doesn’t exist).to_sql("books", conn, if_exists="replace", index=False)
:While formats like CSV or JSON work well for smaller projects, databases offer superior performance, query optimization, and data integrity when handling larger datasets. The seamless integration of Pandas with SQLite makes it simple to store, retrieve, and manipulate data efficiently. Whether you're building a data pipeline or a complete application, understanding how to leverage databases will greatly enhance your ability to work with data effectively. Start using these tools today to streamline your data workflows and unlock new possibilities!