CS Open CourseWare ii:labs:03:tasks

CS Open CourseWare ii:labs:03:tasks http://ocw.cs.pub.ro/courses/ 2026-06-10T06:41:10+03:00 CS Open CourseWare http://ocw.cs.pub.ro/courses/ http://ocw.cs.pub.ro/courses/lib/tpl/arctic/images/favicon.ico text/html 2024-11-15T03:31:57+03:00 01. [30p] Python environment http://ocw.cs.pub.ro/courses/ii/labs/03/tasks/01?rev=1731634317&do=diff 01. [30p] Python environment Python libraries are collections of reusable code that provide functionality for a wide range of tasks, from data analysis and machine learning to web development and automation. Libraries are often hosted on the Python Package Index (PyPI) and can be easily installed using package managers like pip. text/html 2024-11-07T12:02:55+03:00 02. [20p] Making HTTP Requests http://ocw.cs.pub.ro/courses/ii/labs/03/tasks/02?rev=1730973775&do=diff 02. [20p] Making HTTP Requests Now that we have the requests library, we can easily send HTTP requests to any URL. This prompts the server to respond with the information we need. When the request is successful, the server will reply with a standard status code: 200 (Success), indicating everything went smoothly. Simply replace the URL with the desired website (try a wikipedia page), and you’re ready to go! text/html 2024-11-15T03:42:36+03:00 03. [40p] Parsing the HTML content http://ocw.cs.pub.ro/courses/ii/labs/03/tasks/03?rev=1731634956&do=diff 03. [40p] Parsing the HTML content Now, let’s parse through the HTML we just received. We’ll use BeautifulSoup, a powerful library commonly used in web scraping. BeautifulSoup helps us navigate and work with the HTML content of any page, making it easy to locate specific data we want to extract. text/html 2024-11-15T13:42:48+03:00 04. [20p] Handling multi-page websites http://ocw.cs.pub.ro/courses/ii/labs/03/tasks/04?rev=1731670968&do=diff 04. [20p] Handling multi-page websites Most websites have multiple pages, so our scraper should be capable of handling this by navigating through pagination. Pagination is typically controlled through the URL, so we’ll need to make multiple requests for each page. By identifying the pattern in the website’s subpage URLs, we can scan each page separately to gather all the data we need.