| not a data engineer. just someone who keeps needing to pull data from websites for different projects and kept hitting the same walls over and over. here's what i tried before i found something that stuck. 1- beautifulsoup: fine for basic stuff. the second a site uses javascript it returns nothing useful. spent more time figuring out why it wasn't working than actually using it. 2- scrapy: powerful but felt like overkill. setting it up for a simple project felt like way too much work. gave up after two days. 3- selenium: worked but slow as hell. also kept breaking whenever a site updated its layout. every time a site updated something it just bro 4- apify: actually decent but the pricing crept up fast once i started scraping at any real volume. got a bill i didn't expect and just stopped using it. 5- firecrawl: been using this for the past few months. one api call, get back clean markdown, javascript rendering handled, no extra parsing. the stuff that used to take me days to set up now takes an hour. using it for a couple of ai projects where i need clean web data going into an llm and i just get the data and move on. not saying the others are bad. scrapy and apify are solid for certain things. but for the kind of projects i build, one person, moving fast, needing clean data for ai pipelines, firecrawl is the only one i didn't eventually abandon. would love to know which tools you’re using these days btw [link] [comments] |