[Selenium] 맥북에서 Selenium을 활용해 Crawling하기

etc..

[Selenium] 맥북에서 Selenium을 활용해 Crawling하기

Joyfullyever 2025. 2. 3. 11:05

#1. Selenium 버전 4 이상부터는 별도의 chromedriver 설치 없이 Chrome Devtool Protocol (CDP)을 활용하여 자동화가 가능.

https://www.selenium.dev/downloads/

Downloads

Downloads Below is where you can find the latest releases of all the Selenium components. You can also find a list of previous releases, source code, and additional information for Maven users.

www.selenium.dev

#2. 공식 홈페이지에서 최신 JAR 파일 다운받기

처음 실행을 할 때에는 상단의 JAVA에서 다운을 하여 많은 JAR 파일도 다운을 해야했고, chromedriver 없이는 Crawling이 불가능했음.

하단의 Selenium server (Grid) 버전을 다운하여 압축을 풀어야함.

#3. Crawling 할 프로젝트를 만든 후, 받은 selenium 파일을 넣어줘야 함.

프로젝트의 src 파일 우클릭 → Build Path → Configure Build Path → Libraries → Class Path 선택 → 우측 Add External JARs 선택 후 다운 받은 [selenium-server-4.28.1] 파일을 추가.

#4. Selenium을 통해 기능 import하기

import org.openqa.selenium.By;

import org.openqa.selenium.WebDriver;

import org.openqa.selenium.WebElement;

import org.openqa.selenium.chrome.ChromeDriver;

import org.openqa.selenium.chrome.ChromeOptions;

import org.openqa.selenium.support.ui.ExpectedConditions;

import org.openqa.selenium.support.ui.WebDriverWait;

코드를 먼저 작성 한 후 기능을 한번에 [Command] + [Shift] + [O]를 통해 불러올 수도 있음.

#5. Crawling 하기

ArrayList<String> datas = new ArrayList<>();

try {

// Chrome 옵션 설정

ChromeOptions options = new ChromeOptions();

options.addArguments("--headless=new"); // 최신 Headless 모드

options.addArguments("--no-sandbox");

options.addArguments("--disable-dev-shm-usage");

options.addArguments("--remote-allow-origins=*"); // 원격 실행 오류 방지

// WebDriver 생성

WebDriver driver = new ChromeDriver(options);

try {

// 웹페이지 로드

driver.get("https://m.kinolights.com/ranking/netflix");

// 페이지가 완전히 로드될 때까지 충분히 기다림

Thread.sleep(3000); // 3초 대기

// 명시적 대기 설정 (최대 대기 10초) → 내가 지정한 요소가 나타날 때 까지 기다리는 것

WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(10));

wait.until(ExpectedConditions.presenceOfAllElementsLocatedBy(By.cssSelector(".info__title")));

wait.until(ExpectedConditions.presenceOfAllElementsLocatedBy(By.cssSelector("span.score__number")));

// .info__title, span.score__number 클래스를 가진 모든 요소를 찾아 리스트로 저장

List<WebElement> titleElements = driver.findElements(By.cssSelector(".info__title"));

List<WebElement> scoreElements = driver.findElements(By.cssSelector("span.score__number"));

// 결과 출력 (상위 20개만)

System.out.println("\n===== 넷플릭스 TOP 20 =====");

int count = 0;

for (int i = 0; i < titleElements.size() && count < 20; i++) {

String title = titleElements.get(i).getText().trim();

String ratingText = scoreElements.get(i).getText().trim();

if (!title.isEmpty() && !ratingText.isEmpty()) {

String result = (count + 1) + ". " + title + " - 평점: " + ratingText;

datas.add(result);

count++;

}

} finally {

driver.quit();

System.out.println("브라우저 종료 완료");

}

} catch (Exception e) {

System.out.println("오류 발생: " + e.getMessage());

e.printStackTrace();

}

return datas;

}

위 코드를 작성하고 실행하면 정상적으로 Crawling 되는 것이 확인 가능함!