Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ewis:laboratoare:04 [2022/03/29 15:58]
alexandru.predescu [Resources]
ewis:laboratoare:04 [2023/03/29 15:23] (current)
alexandru.predescu [Web Scraping in Python]
Line 7: Line 7:
 ==== Introduction to OOP in Python ===== ==== Introduction to OOP in Python =====
  
-Object Oriented Programming (OOP) is a programming ​that allows us to organize software as a collection of objects that consist of both data and behavior.+Object Oriented Programming (OOP) is a paradigm ​that allows us to organize software as a collection of objects that consist of both data and behavior.
  
 The advantages of OOP: The advantages of OOP:
Line 17: Line 17:
   ***Extensibility**:​ adding new features can be solved by creating new objects, without affecting the existing ones. Changes inside a class do not affect any other part of a program   ***Extensibility**:​ adding new features can be solved by creating new objects, without affecting the existing ones. Changes inside a class do not affect any other part of a program
  
-**Encapsulation,​ polymorphism,​ abstraction,​ inheritance** are fundamentals in object oriented programming language (in Python ​it'​s ​a bit more loose implementation)+**Encapsulation,​ polymorphism,​ abstraction,​ inheritance** are fundamentals in object oriented programming language (in Python ​they are a bit more loosely defined)
  
-<note tip>In Python we can use OOP but it's not mandatory. Other programming languages (e.g. Java, C#) are actually centered on OOP paradigm ​with better support for enterprise development. However Python is more often used as a scripting language, focused on simplicity, ​and OOP can be hard to master.</​note>​+  ***Encapsulation**:​ data and functionality are contained and accessible via a single unit 
 +  ***Abstraction**:​ abstract units expose only a high-level interface and hides the implementation details 
 +  ***Inheritance**:​ the procedure in which one class inherits the attributes and methods of another class 
 +  ***Polymorphism**:​ the provision of a single interface to entities of different types  
 + 
 +<note tip>​Python ​is often used as a scripting language, focused on simplicity and flexibility,​ so we can use OOP but it's not mandatory. This is because, in practice, OOP is easy to learn but hard to master. Other programming languages (e.g. Java, C#) are actually centered on the OOP paradigm ​to provide ​better support for enterprise ​software ​development ​(less flexible but more organized ​and maintainable).</​note>​
  
 ==== Classes and Objects ==== ==== Classes and Objects ====
  
-A class is a user-defined data structure from which objects are created. Classes provide a means of bundling data (variables) and functionality (functions) together. Encapsulationdata methods.+A class is a user-defined data structure from which objects are created. Classes provide a means of bundling data (variables) and functionality (functions) together. ​**Encapsulation** is the most important principle of OOP where data (attributes) and functionality (methods) are contained and accessible via a single unit. **Abstraction** is another core principle, which is similar to encapsulation but exposes only a high-level interface and hides the implementation details.
  
-For example in a banking application different objects may be bank account, customer type, branch.+<note tip>For example in a banking application different objects may be **bank account****customer**,​ **customer type****branch**. These can contain specific methods and attributes, can be related (e.g. a bank account belongs to a customer of some type - individual/​business,​ and was created at a branch), and should be easy to use, maintain and extend as the application becomes larger.</​note>​
  
 In Python, a class is defined using ''​class''​ and class methods (functions) are defined using ''​def''​ and **always** have the first parameter ''​self''​. The keyword ''​self''​ represents the instance of the class, and can be used to access the attributes and methods of the class. In Python, a class is defined using ''​class''​ and class methods (functions) are defined using ''​def''​ and **always** have the first parameter ''​self''​. The keyword ''​self''​ represents the instance of the class, and can be used to access the attributes and methods of the class.
Line 64: Line 69:
 <​note>​ <​note>​
  
-**T1 (1p)** Create a class //Student// with the instance attributes //name// and //grade// and a method //​change_grade//​. Use the class to create two instances with the names //Alice// and //Bob// and the method //​change_grade//​ to assign their grades. ​+**T1 (2p)** Create a class //Student// with the instance attributes //name// and //grade// and a method //​change_grade//​. Use the class to create two instances with the names //Alice// and //Bob// and the method //​change_grade//​ to assign their grades. ​
  
 </​note>​ </​note>​
Line 125: Line 130:
 **T2 (1p)** Override the method //say_hi// to show the grade as well. **T2 (1p)** Override the method //say_hi// to show the grade as well.
   *Hint: You can define (override) the method in the //Student// class and re-use the method defined in the parent class   *Hint: You can define (override) the method in the //Student// class and re-use the method defined in the parent class
 +**T3 (1p)** **Polymorphism** represents a key principle of OOP. To understand this principle, create a list that contains multiple objects of class //Person// and //​Student//​. For each of the elements print the name using the method //say_hi//. Is there any difference between the two types of objects when we use them in the main program?
 </​note>​ </​note>​
  
Line 172: Line 178:
 </​code>​ </​code>​
  
-<note tip>​Sometimes we can replace inheritance with composition and achieve a similar result – this principle is called [[https://​en.wikipedia.org/​wiki/​Composition_over_inheritance|Composition over inheritance]] ​- sometimes considered preferable because it makes the code easier to design ​and maintain.</​note>​+<note tip>Complex class hierarchies can become hard to understand. ​Sometimes we can replace inheritance with composition and achieve a similar result – this principle is called [[https://​en.wikipedia.org/​wiki/​Composition_over_inheritance|Composition over inheritance]] ​and can make the code easier to understand ​and maintain.</​note>​
  
 <​note>​ <​note>​
 **T3 (1p)** Add the two methods (//​add_course_grade//,​ //​compute_gpa//​) to the Student class **T3 (1p)** Add the two methods (//​add_course_grade//,​ //​compute_gpa//​) to the Student class
  
-**T4 (1p)** Examine the code. How are the objects //student// and //​course_grade//​ related? (aggregation vs composition).+**T4 (1p)** Examine the code. How are the objects //student// and //​course_grade//​ related? (aggregation vs composition)
 </​note>​ </​note>​
  
Line 221: Line 227:
  
 <code python> <code python>
-from subprocess ​import ​Popen, PIPE +import ​requests 
-from lxml import etree + 
-from io import StringIO +# url to scrape data from
-user_agent = '​Mozilla/​5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/​537.36 (KHTML, like Gecko) Chrome/​55.0.2883.95 Safari/​537.36'​+
 url = '​https://​webscraper.io/​test-sites/​e-commerce/​allinone/​computers/​laptops'​ url = '​https://​webscraper.io/​test-sites/​e-commerce/​allinone/​computers/​laptops'​
-print("​fetching" ​+ url+ 
-get = Popen(['​curl',​ '​-s',​ '​-A',​ user_agent, ​url], stdout=PIPE+print("​fetching ​page") 
-result = get.stdout.read().decode('​utf8'​) + 
-tree etree.parse(StringIO(result),​ etree.HTMLParser()) +get response object 
-str_tree = etree.tostring(tree,​ encoding='​utf8',​ method='​xml'​) +response ​requests.get(url) 
-str_data ​str_tree.decode()+ 
 +get byte string 
 +byte_data ​response.content 
 + 
 +# get html source code 
 +html_data ​byte_data.decode("​utf-8"​) 
 print("​writing file") print("​writing file")
 with open("​index.html",​ "​w",​ encoding="​utf-8"​) as f: with open("​index.html",​ "​w",​ encoding="​utf-8"​) as f:
-    f.write(str_data)+    f.write(html_data)
 </​code>​ </​code>​
  
-<note tip>The Python script ​uses ''​curl'',​ the command line tool that can request the web page from the HTTP server. You can find more about ''​curl'' ​[[https://curl.se/docs/httpscripting.html|here]].</​note>​+<note tip>The Python script ​makes an HTTP request ​to retrieve ​the web page from the server. You can find more about HTTP requests ​[[https://developer.mozilla.org/​en-US/docs/Web/​HTTP/​Overview|here]].</​note>​
  
 To parse the HTML file (separating the different tags in the HTML), we use the //etree// module from //lxml// To parse the HTML file (separating the different tags in the HTML), we use the //etree// module from //lxml//
Line 246: Line 257:
  
 filename = "​index.html"​ filename = "​index.html"​
 +parser = etree.HTMLParser()
 tree = etree.parse(filename) tree = etree.parse(filename)
 tags = [[elem.tag, elem.attrib,​ elem.text] for elem in tree.iter()] tags = [[elem.tag, elem.attrib,​ elem.text] for elem in tree.iter()]
Line 268: Line 280:
   * [[https://​python-textbok.readthedocs.io/​en/​1.0/​Object_Oriented_Programming.html|Object-Oriented Programming in Python]]   * [[https://​python-textbok.readthedocs.io/​en/​1.0/​Object_Oriented_Programming.html|Object-Oriented Programming in Python]]
   * [[https://​docs.python.org/​3/​library/​datetime.html|datetime — Basic date and time types]]   * [[https://​docs.python.org/​3/​library/​datetime.html|datetime — Basic date and time types]]
 +  * [[https://​en.wikipedia.org/​wiki/​Composition_over_inheritance|Composition over inheritance]]
 +  * [[https://​www.w3schools.com/​html/​|HTML Tutorial]]
 +  * [[https://​lxml.de/​api/​lxml.etree._Element-class.html|lxml API]]
  
  
ewis/laboratoare/04.1648558693.txt.gz · Last modified: 2022/03/29 15:58 by alexandru.predescu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0