More

Show feature count of layer via Python console / PyQGIS


When working with vector layers in QGIS, I can display the feature count of a layer and, if applicable, the feature count per attribute category (e.g. when using categorized or graduated styling) via the context menu of the layer. The counts are added to the layer title that is displayed in the layer widget.

How can I set activate this feature via the Python console/running a script? Looking through the API docs, I found a method to get the feature counts and further use them in your script. However, how can I display them as they would show when using the GUI?

It is probably possible somehow by retrieving the feature count as described above and then changing the layer widget symbology, adding the count to the title. However, this is rather cumbersome, so I wonder if there is a direct way to achieve this? If not, how would this best be done manually?


That is pretty simple by using custom properties by thesetCustomProperty()method.

The below example adds a memory layer to the legend and set the "showFeatureCount" property to True (you can run it from pyqgis console):

## create the memory layer and add to the registry myLayer = QgsVectorLayer("Point", "myLayer", "memory") QgsMapLayerRegistry.instance().addMapLayer(myLayer, False) ## reference to the layer tree root = QgsProject.instance().layerTreeRoot() ## adds the memory layer to the layer node at index 0 myLayerNode = QgsLayerTreeLayer(myLayer) root.insertChildNode(0, myLayerNode) ## set custom property myLayerNode.setCustomProperty("showFeatureCount", True)

To get a specific property of the current layer (node layer) in layer tree you can usecustomProperty()orcustomProperties()to get all the stored properties for that layer:

myLayerNode.customProperty("showFeatureCount") ## the result is True myLayerNode.customProperties() ## the result is a list [u'showFeatureCount']

Using the Python window

Using tool dialog boxes is the most common way to execute geoprocessing operations for those new to geoprocessing. When only a single tool must be executed at a time, these are a good way to run operations. Geoprocessing tool dialog boxes are easy to use and provide immediate feedback by placing warning or error icons and messages next to parameters not being used correctly. However, there are other, more efficient ways to execute geoprocessing tools or operations, such as Python scripting.

The Python window is a fully interactive Python interpreter (or interface) that allows geoprocessing tools and python functionality to be executed inside an ArcGIS for Desktop application. This window is the best location to directly access Python scripting functionality in ArcGIS. Skills learned in the Python window can be directly applied when creating more complex stand-alone Python scripts or Python script tools.

The simplest way to use Python in ArcGIS is to enter Python commands into the Python window. The Python window prompts with three greater-than symbols ( >>>), indicating the first line of the code block to execute. Simple Python syntax can be immediately entered and executed from this first line. Since the Python code that is entered can be immediately executed by pressing the ENTER key, the Python window can become a useful location to run and view experimental code. If unsure how a particular Python command works, open the Python window and experiment until the command runs without raising an error.

There are several key features that make the Python window a valuable resource for running and experimenting with Python commands and syntax:

  • All Python functionality is exposed through the Python window.
  • Multiline commands that contain more than one geoprocessing tool or geoprocessor method can be entered and executed.
  • Tools or functions that have already been entered and executed can be recalled, edited, and reexecuted.
  • Python commands or blocks of code can be loaded from existing Python files.
  • Python commands or blocks of code can be saved to a Python or text file to reload later or used in a different environment.
  • Autocompletion functionality makes filling in geoprocessing tool parameters quicker and easier than using tool dialog boxes.

In the above example, a simple statement is printed and a variable is assigned a value. Notice after the print statement and after the variable count , the return value is echoed on the Python window.


You can globally set printing options. I think this should work:

This will allow you to see all column names & rows when you are doing .head() . None of the column name will be truncated.

If you just want to see the column names you can do:

To obtain all the column names of a DataFrame, df_data in this example, you just need to use the command df_data.columns.values . This will show you a list with all the Column names of your Dataframe

In the interactive console, it's easy to do:

This will do the trick. Note the use of display() instead of print.

The use of display is required because pd.option_context settings only apply to display and not to print .

What worked for me was the following:

You can also set it to an integer larger than your number of columns.

The easiest way I've found is just

Personally I wouldn't want to change the globals, it's not that often I want to see all the columns names.

To get all column name you can iterate over the data_all2.columns .

You will get all column names. Or you can store all column names to another list variable and then print list.

Not a conventional answer, but I guess you could transpose the dataframe to look at the rows instead of the columns. I use this because I find looking at rows more 'intuitional' than looking at columns:

This should let you view all the rows. This action is not permanent, it just lets you view the transposed version of the dataframe.

If the rows are still truncated, just use print(data_all2.T) to view everything.

If you just want to see all the columns you can do something of this sort as a quick fix

now cols will behave as a iterative variable that can be indexed. for example

A quick and dirty solution would be to convert it to a string

would cause all of them to be printed out separated by tabs Of course, do note that with 102 names, all of them rather long, this will be a bit hard to read through

I had lots of duplicate column names, and once I ran

I was able to see the full list of columns

I know it is a repetition but I always end up copy pasting and modifying YOLO's answer:


2 Answers 2

The communication protocol between processes uses pickling, and the pickled data is prefixed with the size of the pickled data. For your method, all arguments together are pickled as one object.

You produced an object that when pickled is larger than fits in a i struct formatter (a four-byte signed integer), which breaks the assumptions the code has made.

You could delegate reading of your dataframes to the child process instead, only sending across the metadata needed to load the dataframe. Their combined size is nearing 1GB, way too much data to share over a pipe between your processes.

Better to inherit than pickle/unpickle

When using the spawn or forkserver start methods many types from multiprocessing need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.

If you are not running on Windows and use either the spawn or forkserver methods, you could load your dataframes as globals before starting your subprocesses, at which point the child processes will 'inherit' the data via the normal OS copy-on-write memory page sharing mechanisms.

Note that this limit was raised for non-Windows systems in Python 3.8, to an unsigned long long (8 bytes), and so you can now send and receive 4 EiB of data. See this commit, and Python issues #35152 and #17560.

If you can't upgrade and you can't make use of resource inheriting, and are not running on Windows, then use this patch:


Debug Options¶

Enables memory error detection

Disables mouse grab (to interact with a debugger in some cases)

Keeps Python’s sys.stdin rather than setting it to None

Set debug value of <value> on startup.

Enable debug messages for the event system.

Enable debug messages from FFmpeg library.

Enable debug messages for event handling.

Enable debug messages from libmv library.

Enable debug messages from Cycles.

Enable fully guarded memory allocation and debugging.

Enable time profiling for background jobs.

Enable debug messages for Python.

Enable all debug messages from dependency graph.

Enable debug messages from dependency graph related on evaluation.

Enable debug messages from dependency graph related on graph construction.

Enable debug messages from dependency graph related on tagging.

Switch dependency graph to a single threaded evaluation.

Enable debug messages from dependency graph related on timing.

Enable colors for dependency graph debug messages.

Enable debug messages from dependency graph related on graph construction.

Enable debug messages for event handling.

Enable GPU debug context and information for OpenGL 4.3+.

Enable workarounds for typical GPU issues and disable all GPU extensions.

Enable debug messages for the window manager, shows all operators in search, shows keymap errors.

Enable debug messages for virtual reality contexts. Enables the OpenXR API validation layer, (OpenXR) debug messages and general information prints.

Enable debug messages for virtual reality frame rendering times.

Enable all debug messages.

Enable debug messages for I/O (Collada, …).

Enable floating-point exceptions.

Immediately exit when internal errors are detected.

Disable the crash handler.

Disable the abort handler.

Set the logging verbosity level for debug messages that support it.


Getting started¶

In the toolbar click to show the GEE Timeseries Exlorer panel.

There are two main panels: The Plot Window on the left and the Collection and Visualization settings on the right.

The Collection tab on the right panel allows you to select a predefined Image Collection from the list and filter it by time interval and/or metadata properties. The subsection Collection Editor provides the python code used to access the imagery and enables full control on the code executed by the user, having access to the entire Earth Engine Data Catalogue (https://developers.google.com/earth-engine/datasets).

For a quick start, the code snippet for accessing the USGS Landsat 8 Surface Reflectance Tier 1 image collection is already prepared.

Click on the info button next to the collection list to open and inspect the USGS Landsat 8 Surface Reflectance Tier 1 description.

This section displays the spectral time series of the user defined collection and time frame. Click on Activate point selection tool in the upper left corner of the plugin and select at least one spectral band from the list. Here we use the predefined collection Landsat TM-ETM-OLI Surface Reflectance Cloud Masked which combines the Landsat sensors into one collection and applies masking based on quality bands (hint: check the Collection Editor for details). Set Filter Date to 1984-01-01 to 2020-12-31 to make use of the entire Landsat archive. Furthermore it can be useful to edit Filter Property to only consider scenes with less than 70% cloud cover.

Then, click into the map canvas to select a point location and to read the temporal profile data.

When changing the date range or filtered metadata properties, click on Read point profile to re-read the current location.

In the top panel, click on to retrieve the displayed time series as raw data:

It provides access to the raw data with information on the unique image id, the geographic coordinates, the date of acquisition and the spectral values for all bands selected.

Apart from plot based time series, you can visualize entire images and image aggregates. Open the Visualization tab on the right panel. In the subsection Temporal Binning you can specify a temporal window for which the visualisation will be rendered. By default, this corresponds to a single date, but can be expanded to a date range by increasing the Length parameters value and/or type (Day, Month, Year). The yellow vertical line (single date) or box (multi date) in the Plot Window illustrates the given choice graphically and can further be used to change the temporal window interactively by clicking inside the plot window.

Secondly, select the desired image visualization in the Band Rendering subsection. For RGB composites use Multiband as Render Type and select an input band for each color channel, e.g by selecting SWIR2 as red band, NIR as green band and RED as blue band. By default, the Min / Max values are estimated using percentiles (default: 2% to 98% ). Furthermore, we checked Show full map canvas extent to visualize all scenes acquired on the specified date that fall into our current map canvas extent.

Lastly, click Apply to estimate the values and make sure the image visibility is toggled on .

Also try to use the buttons in the upper right corner of the plugin to jump to the previous/next observation dates or time frames.

Visualize Temporally Aggregated Images

In case of Landsat data it is usefull to not only visualize data at a specific date in time, but to aggregate multiple observations over a date range (e.g. the revisiting time of 16 days) and to also visualize observations from neighbouring overflight pathes at the same time. Furthermore, the aggregation of time series into pixel based statistics is usefull for preserving variance whilst reducing the dimensionality of the data. Pixel-based calculations over time are referred to as Reducers in Earth Engine. For each band, you can select from a variety of statistical reducers and visualize them accordingly.

For example, let’s visualize the median of the selected band combination for our entire map canvas using all imagery acquired in 2018:

Often we need time series of spectral bands or indices for multiple locations in space. GEE-TSE allows to specify a point layer that is loaded in QGIS for which the time series of the selected bands and time frame can be downloaded to csv-files. To do so, select a point layer as input in the Point Browser panel at the bottom of the plugin and click in the lower right corner. Specify a target folder in which the .csv-files for each feature of the point layer are stored.

Furthermore, you can select attributes of the point layer and use the list or the buttons to reload the plot and image visualization for the given feature.


Pictures

  • Latest Release: 1.26.0
    released on 05 Apr 2021
  • Releases: 62
  • Stars: 86
  • Issues: 18 open, 200 closed
  • Last push: 05 Apr 2021
  • OctoPrint: 1.3.10+, 1.4.0+
  • Operating Systems: all
  • Python: >=2.7,<4

If there is something wrong with this listing (broken links & images etc), please report it here.

If this plugin has been confirmed as abandoned by its maintainer, please report it here.


TUTORIALS

0.28.1 (2021-06-07)

  • Upgrade Leaflet to 1.7.1.
  • Deprecate ‘LeafletWidget._get_attrs’ in favor of ‘LeafletWidget.get_attrs’

0.28.0 (2021-04-15)

  • Support Django 3.1 and 3.2.
  • Drop support for Django 1.11, 2.0, and 2.1.
  • translate to persian #313

0.27.1 (2020-07-31)

0.27.0 (2020-07-03)

  • Drop support for Django < 1.11 and Python 2
  • Upgrade proj4js to 2.6.1 and Proj4Leaflet to 1.0.2 (#287)
  • Update Czech translations, add Slovak translations #269
  • Add Arabic translation #274
  • Precision fixes #280 #291
  • Display the map on mobile (Fixes #241) #292
  • Updated Leaflet to 1.6.0

0.26.0 (2019-12-06)

0.25.0 (2019-10-18)

  • #225 changes in staticfiles for django 1.11.14
  • #247 Allow resizing of raw Geometry textbox input via CSS, improve label, add docs
  • #108 Add examples to docs on adding overlays, customising maps in templates, admin and forms
  • #248 Allow use of a custom widget in the Admin. (fixes #151)
  • #261 Add request to formfield_for_dbfield signature fix #260
  • #262 Fix Missing staticfiles manifest entry for ‘leaflet/images’

0.24.0 (2018-06-07)

0.23.0 (2017-11-28)

  • Fix fatal bug with Django => 1.11.2 for non-GIS databases
  • fixes #188 Better replace for icon image
  • Add Russian translation
  • Add Hungarian translation
  • Allow storing global leaflet map instances

0.22.0 (2017-04-06)

  • Set a default max zoom in leaflet.forms _setView to avoid an error.
  • Fix the div ids to work with admin inlines.
  • Django 1.11 compatibility
  • Fix multipolyline/multipolygon and polyline/polygon not working

0.21.0 (2017-02-28)

0.20.0 (2017-01-27)

New features

  • Update Leaflet to 1.0.3 (#169)
  • Update Leaflet-draw to 0.4.0 (#169)
  • Update Proj4Leaflet to 1.0.0 (#169)
  • Made static calls lazy, to fix issues with non-default STATICFILES_STORAGE (#149)
  • Add example application (#168)
  • Use SpatiaLite library path from environment variable for running test (#173)
  • Fix max zoom level (#165)
  • Add SPATIAL_EXTENT default value to the default settings (#167)

Many thanks to @KostyaEsmukov, @cleder, @sikmir and @seav for their contributions!

0.19.0 (2016-08-22)

New features

  • Added leaflet.admin.LeafletGeoAdminMixin, useful for stacked or tabular inline forms (thanks @KostyaEsmukov, @Xowap)

0.18.2 (2016-08-16)

0.18.1 (2016-04-07)

  • If the TILES setting contains an empty list, no default tiles layer is generated (thanks @dyve).
  • Fix to allow multipoints saving (fixes #130, thanks @rukayaj)
  • Fix settings override (#142, thanks @ndufrane)
  • Fix for templatetags.leflet_js debug setting (#148, thanks @arctelix)
  • Fixes for Django 1.10 compatibility (#138, thanks @PetrDiouhy)

0.18.0 (2016-01-04)

New features

  • Use a LazyEncoder to allow lazy translations in settings (#132, thanks @Mactory)
  • Enable settings_overrides also for admin (fixes #120, thanks @PetrDiouhy)
  • Add tests for Django 1.9 and Python 3.5 (thanks @itbabu)

0.17.1 (2015-12-16)

  • Update Leaflet to 0.7.7
  • Update Leaflet-draw to 0.2.4
  • Fix rendering of leaflet widget when initial value is an empty string

0.17.0 (2015-11-11)

New features

  • Pass relative URLs for static files through django.contrib.staticfiles (thanks @dyve, fixes #111)
  • Allow to override settings at the template tag level (thanks @PetrDiouhy, fixes #59)
  • Update Leaflet to 0.7.5 (@dyve)
  • Add Czech locale (thanks @PetrDiouhy)
  • Fix interaction with django-geojson (#106, thanks @batisteo)
  • Use protocol independant URLs in default OSM tiles (thanks @NotSqrt)
  • Fix deprecated TEMPLATE_DEBUG (#121, thanks @josenaka)
  • Fix errors with multi-word field names (#123, thanks @josemazo)
  • Fix loadevent not being taken into account in forms (#127, thanks @josemazo)

0.16.0 (2015-04-17)

New features

  • Add setting FORCE_IMAGE_PATH to bypass Leaflet guess on image paths (useful when using django-compressor) (thanks @nimasmi)
  • Add Hebrew translations (thanks @nonZero)
  • Map attribution can be translated using ugettext_lazy
  • Fix widgets hanging forever with points (thanks @Azimkhan, fixes #90)
  • Remove setTimeout when calling setView() (thanks @manelclos, fixes #89)
  • Fix minZoom/maxZoom when undefined in settings (thanks Manel Clos)

0.15.2 (2014-12-22)

0.15.1 (2014-12-04)

  • Remove special characters in README (fixes #82)
  • Fix translation in French (fixes #86)
  • Fix es localization

0.15.0 (2014-10-24)

0.14.2 (2014-10-24)

  • Fix Django 1.7 support in tests (thanks Marco Badan)
  • Add spanish translations (thanks David Martinez)

0.14.1 (2014-07-30)

  • Fix draw events being received for each draw control on the map. (Caution: map.drawControl attribute is not set anymore)

0.14.0 (2014-07-29)

  • Fix GeoJSON serialization when creating new MultiPoint records
  • Make the only layer match the map max/min_zoom (fixes #67) (thanks Manel Clos)
  • Added widget attribute to edit several fields on the same map

0.13.7 (2014-06-26)

0.13.6 (2014-06-26)

  • Setup Projection machinery in Leaflet forms if necessary
  • Django Leaflet forms fiels without libgeos installed (thanks Florent Lebreton)

0.13.5 (2014-06-18)

0.13.4 (2014-06-13)

0.13.3 (2014-06-10)

0.13.2 (2014-04-15)

0.13.1 (2014-04-10)

  • Fix GEOS dependency, back as optional for geometry edition only (fixes #65)
  • Add minZoom and maxZoom to map initialization
  • Add support of advanced static files locations, like S3 (thanks @jnm)

0.13.0 (2014-03-26)

0.12 (2014-03-22)

0.11.1 (2014-02-12)

0.11.0 (2014-02-07)

  • Add control of metric and imperial in SCALE option (thanks @smcoll)
  • Upgrade to Leaflet.draw 0.2.3

0.10.1 (2014-02-03)

0.10.0 (2014-01-22)

0.9.0 (2013-12-11)

  • Upgrade to Leaflet 0.7.1
  • Fix unsaved warning being always triggered on Internet Explorer.
  • Added DE locale (thanks @rosscdh)
  • Fix installation with python 2.6 (thanks @ollb)

0.8.5 (2013-11-05)

0.8.4 (2013-11-05)

0.8.3 (2013-11-05)

0.8.2 (2013-10-31)

  • Fix drawing of multi-polygon (fixes #37)
  • Fix attached data for events with jQuery fallback (fixes #38)
  • Fix Javascript syntax errors when using form prefixes (fixes #40)

0.8.1 (2013-09-30)

  • Fix Leaflet library inclusion with “plugins=ALL” outside Admin.
  • Do not include translations in every widgets outside Admin.
  • Fix syntax error if form widget translations contains quotes.
  • Fix dependency error if Leaflet is loaded after the form widget in the DOM.
  • Respect plugins declaration order using OrderedDicts
  • Prepend forms assets (instead of extend) if PLUGINS[‘forms’] already exists.

0.8.0 (2013-09-18)

  • Renamed Leaflet map fragment template
  • Leaflet map geometry widgets for adminsite and forms (requires Django 1.6)
  • Fix geometry type restriction in form fields (fixes #32)
  • Use jQuery for triggering events, only if CustomEvent constructor is not available (fixes #27, fixes #34)

0.7.4 (2013-08-28)

  • Fix projection download error if not available
  • Compute resolutions the same way TileCache does it, and provide example of TileCache configuration.
  • Raise ImproperlyConfigured if TILES_EXTENT is not portrait (since not supported)

0.7.3 (2013-08-23)

  • Do not use console() to warn about deprecated stuff if not available (<IE9)
  • Fix apparence of Reset view control for Leaflet 0.6
  • Add French and Italian locales

0.7.2 (2013-08-23)

0.7.1 (2013-08-21)

  • Fix map initialization with default tiles setting
  • Fix map fitBounds() to SPATIAL_EXTENT in settings

0.7.0 (2013-08-21)

Breaking changes

  • The leaflet_map template tag no longer registers initialization functions in global scope, and no longer adds map objects into window.maps array by default. Use LEAFLET_CONFIG['NO_GLOBALS'] = False to restore these features.
  • Initialization callback function no longer receives the map bounds in second argument, but the map options object.
  • JS default callback function ( <name>Init() ) for map initialization is deprecated. Use explicit callback parameter in template tag, or listen to window event map:init instead. (See Use Leaflet API section in README.)
  • TILES_URL entry in LEAFLET_CONFIG is deprecated. Use TILES instead.
  • Settings lookup is restricted to LEAFLET_CONFIG dict. Most notably, SRID, MAP_SRID and SPATIAL_EXTENT at global Django settings level are discouraged.

New features

  • Add ability to associate layers attributions from settings
  • Add auto-include key for entries in PLUGINS setting, in order to implicity load plugins with leaflet_css and leaflet_js tags.
  • Rewrote map initialization, into less flexible and obstruvise way.
  • Use plugin system for Leaflet.MiniMap.
  • Add loadevent parameter to leaflet_map tag.
  • Map initialization is now idempotent, does nothing if map is already initialized.
  • Add ATTRIBUTION_PREFIX setting to control prefix globally.

0.6.0 (2013-08-08)

0.6.0a (2013-07-05)

  • Upgrade to Leaflet 0.6.2
  • Upgrade Leaflet.Minimap (rev 3cd58f7)
  • Upgrade Proj4Leaflet (rev f4f5b6d)

0.5.1 (2013-04-08)

  • Add minimap support
  • Drop Leaflet version switching
  • Update Leaflet to 0.5.1
  • Update Leaflet.Minimap
  • Fix apparence of Reset view button

0.4.1 (2012-11-05)

0.4.0 (2012-11-05)

0.3.0 (2012-10-26)

  • Remove max resolution setting since it can be computed
  • Allow scale control even if view is not set
  • Upgrade Leaflet to 0.4.5

0.2.0 (2012-09-22)

  • Fix packaging of templates
  • Use template for <head> fragment
  • Do not rely on spatialreference.org by default
  • Default settings for SRID
  • Default settings for map extent
  • Default map height
  • Default tiles base layer
  • map variable is not global anymore

0.1.0 (2012-08-13)

  • Initial support for map projection
  • Show zoom scale by default
  • Spatial extent configuration
  • Initialization callback instead of global JS variable
  • Leaflet version switching
  • Global layers configuration

0.0.2 (2012-03-22)

0.0.1 (2012-03-16)


6 Answers 6

The batch size defines the number of samples that will be propagated through the network.

For instance, let's say you have 1050 training samples and you want to set up a batch_size equal to 100. The algorithm takes the first 100 samples (from 1st to 100th) from the training dataset and trains the network. Next, it takes the second 100 samples (from 101st to 200th) and trains the network again. We can keep doing this procedure until we have propagated all samples through of the network. Problem might happen with the last set of samples. In our example, we've used 1050 which is not divisible by 100 without remainder. The simplest solution is just to get the final 50 samples and train the network.

Advantages of using a batch size < number of all samples:

It requires less memory. Since you train the network using fewer samples, the overall training procedure requires less memory. That's especially important if you are not able to fit the whole dataset in your machine's memory.

Typically networks train faster with mini-batches. That's because we update the weights after each propagation. In our example we've propagated 11 batches (10 of them had 100 samples and 1 had 50 samples) and after each of them we've updated our network's parameters. If we used all samples during propagation we would make only 1 update for the network's parameter.

Disadvantages of using a batch size < number of all samples:

  • The smaller the batch the less accurate the estimate of the gradient will be. In the figure below, you can see that the direction of the mini-batch gradient (green color) fluctuates much more in comparison to the direction of the full batch gradient (blue color).

Stochastic is just a mini-batch with batch_size equal to 1. In that case, the gradient changes its direction even more often than a mini-batch gradient.

In the neural network terminology:

  • one epoch = one forward pass and one backward pass of all the training examples
  • batch size = the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you'll need.
  • number of iterations = number of passes, each pass using [batch size] number of examples. To be clear, one pass = one forward pass + one backward pass (we do not count the forward pass and backward pass as two different passes).

Example: if you have 1000 training examples, and your batch size is 500, then it will take 2 iterations to complete 1 epoch.

When solving with a CPU or a GPU an Optimization Problem, you iteratively apply an Algorithm over some Input Data. In each of these iterations you usually update a Metric of your problem doing some Calculations on the Data. Now when the size of your data is large it might need a considerable amount of time to complete every iteration, and may consume a lot of resources. So sometimes you choose to apply these iterative calculations on a Portion of the Data to save time and computational resources. This portion is the batch_size and the process is called (in the Neural Network Lingo) batch data processing. When you apply your computations on all your data, then you do online data processing. I guess the terminology comes from the 60s, and even before. Does anyone remember the .bat DOS files? But of course the concept incarnated to mean a thread or portion of the data to be used.

The documentation for Keras about batch size can be found under the fit function in the Models (functional API) page

batch_size : Integer or None. Number of samples per gradient update. If unspecified, batch_size will default to 32.

If you have a small dataset, it would be best to make the batch size equal to the size of the training data. First try with a small batch then increase to save time. As itdxer mentioned, there's a tradeoff between accuracy and speed.

The question has been asked a while ago but I think people are still tumbling across it. For me it helped to know about the mathematical background to understand batching and where the advantages/disadvantages mentioned in itdxer's answer come from. So please take this as a complementary explanation to the accepted answer.

Consider Gradient Descent as an optimization algorithm to minimize your Loss function $J( heta)$ . The updating step in Gradient Descent is given by

$ heta_ = heta_ - alpha abla J( heta)$

For simplicity let's assume you only have 1 parameter ( $n=1$ ), but you have a total of 1050 training samples ( $m = 1050$ ) as suggested by itdxer.

Full-Batch Gradient Descent

In Batch Gradient Descent one computes the gradient for a batch of training samples first (represented by the sum in below equation, here the batch comprises all samples $m$ = full-batch) and then updates the parameter:

$ heta_ = heta_ - alpha sum^m_ abla J_j( heta)$

This is what is described in the wikipedia excerpt from the OP. For large number of training samples, the updating step becomes very expensive since the gradient has to be evaluated for each summand.

Stochastic Gradient Descent

In Stochastic Gradient Descent one computes the gradient for one training sample and updates the paramter immediately. These two steps are repeated for all training samples.

$ heta_ = heta_ - alpha abla J_j( heta)$

One updating step is less expensive since the gradient is only evaluated for a single training sample j.

Difference between both approaches

Updating Speed: Batch gradient descent tends to converge more slowly because the gradient has to be computed for all training samples before updating. Within the same number of computation steps, Stochastic Gradient Descent already updated the parameter multiple times. But why should we then even choose Batch Gradient Descent?

Convergence Direction: Faster updating speed comes at the cost of lower "accuracy". Since in Stochastic Gradient Descent we only incorporate a single training sample to estimate the gradient it does not converge as directly as batch gradient descent. One could say, that the amount of information in each updating step is lower in SGD compared to BGD.

The less direct convergence is nicely depicted in itdxer's answer. Full-Batch has the most direct route of convergence, where as mini-batch or stochastic fluctuate a lot more. Also with SDG it can happen theoretically happen, that the solution never fully converges.

Memory Capacity: As pointed out by itdxer feeding training samples as batches requires memory capacity to load the batches. The greater the batch, the more memory capacity is required.

In my example I used Gradient Descent and no particular loss function, but the concept stays the same since optimization on computers basically always comprises iterative approaches.

So, by batching you have influence over training speed (smaller batch size) vs. gradient estimation accuracy (larger batch size). By choosing the batch size you define how many training samples are combined to estimate the gradient before updating the parameter(s).


Labeling Plots

As the last piece of this section, we’ll briefly look at the labeling of plots: titles, axis labels, and simple legends.

Titles and axis labels are the simplest such labels—there are methods that can be used to quickly set them (Figure 4-17):

Figure 4-17. Examples of axis labels and title

You can adjust the position, size, and style of these labels using optional arguments to the function. For more information, see the Matplotlib documentation and the docstrings of each of these functions.

When multiple lines are being shown within a single axes, it can be useful to create a plot legend that labels each line type. Again, Matplotlib has a built-in way of quickly creating such a legend. It is done via the (you guessed it) plt.legend() method. Though there are several valid ways of using this, I find it easiest to specify the label of each line using the label keyword of the plot function (Figure 4-18):

Figure 4-18. Plot legend example

As you can see, the plt.legend() function keeps track of the line style and color, and matches these with the correct label. More information on specifying and formatting plot legends can be found in the plt.legend() docstring additionally, we will cover some more advanced legend options in “Customizing Plot Legends”.

Matplotlib Gotchas

While most plt functions translate directly to ax methods (such as plt.plot() → ax.plot() , plt.legend() → ax.legend() , etc.), this is not the case for all commands. In particular, functions to set limits, labels, and titles are slightly modified. For transitioning between MATLAB-style functions and object-oriented methods, make the following changes:

In the object-oriented interface to plotting, rather than calling these functions individually, it is often more convenient to use the ax.set() method to set all these properties at once (Figure 4-19):

Figure 4-19. Example of using ax.set to set multiple properties at once

Show feature count of layer via Python console / PyQGIS - Geographic Information Systems

A curated list of awesome Python frameworks, libraries, software and resources.

Libraries for administrative interfaces.

    - The admin panel your servers deserve. - A jazzy skin for the Django Admin-Interface. - Modern responsive template for the Django admin interface with improved functionality. - Alternative Django Admin-Interface (free only for Non-commercial use). - Drop-in replacement of Django admin comes with lots of goodies. - Simple and extensible administrative interface framework for Flask. - Real-time monitor and web admin for Celery. - Admin panel framework for any application with nice UI (ex Jet Django) - A Django app which creates automatic web UIs for Python scripts.

Algorithms and Design Patterns

Python implementation of data structures, algorithms and design patterns. Also see awesome-algorithms.

    Algorithms
      - Minimal examples of data structures and algorithms. - A collection of data structure and algorithms for coding interviews. - Fast and pure-Python implementation of sorted collections. - All Algorithms implemented in Python.
      - A simple yet effective library for implementing common design patterns. - A collection of design patterns in Python. - A lightweight, object-oriented finite state machine implementation.
      - A HTTP, HTTP2 and WebSocket protocol server for ASGI and ASGI-HTTP. - A lightning-fast ASGI server implementation, using uvloop and httptools.

    Libraries for manipulating audio and its metadata.

      Audio
        - Cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding. - Audio fingerprinting and recognition. - Keras Audio Preprocessors - Python library for audio and music analysis - A library for automated reference audio mastering. - An advanced music theory and notation package with MIDI file and playback support. - Audio feature extraction, classification, segmentation and applications. - Manipulate audio with a simple and easy high level interface. - Open web audio processing framework.
        - A music library manager and MusicBrainz tagger. - A tool for working with audio files, specifically MP3 files containing ID3 metadata. - A Python module to handle audio metadata. - A library for reading music meta data of MP3, OGG, FLAC and Wave files.

      Libraries for implementing authentications schemes.

        OAuth
          - JavaScript Object Signing and Encryption draft implementation. - Authentication app for Django that "just works." - OAuth 2 goodies for Django. - A generic and thorough implementation of the OAuth request-signing logic. - A fully tested, abstract interface to creating OAuth clients and servers. - An easy-to-setup social authentication mechanism.
          - JSON Web Token implementation in Python. - A JOSE implementation in Python. - A module for generating and verifying JSON Web Tokens.

        Compile software from source code.

          - A make-like build tool for embedded Linux. - A build system for creating, assembling and deploying applications from multiple parts. - A console tool to build code with different development platforms. - A continuous build tool written in pure Python. - A software construction tool.

        Built-in Classes Enhancement

        Libraries for enhancing Python built-in classes.

          - Replacement for __init__ , __eq__ , __repr__ , etc. boilerplate in class definitions. - Efficient, Pythonic bidirectional map data structures and related functionality.. - Python dictionaries with advanced dot notation access. - (Python standard library) Data classes. - A library that provides a method of accessing lists and dicts with a dotted path notation.

        Content Management Systems.

          - An Open source enterprise CMS based on the Django. - One of the most advanced Content Management Systems built on Django. - A feature-rich event management system, made @ CERN. - A high-level, Pythonic web application framework built on Pyramid. - A powerful, consistent, and flexible content management platform. - A CMS built on top of the open source application server Zope. - Flexible, extensible, small CMS powered by Flask and MongoDB. - A Django content management system.

        Libraries for caching data.

          - A WSGI middleware for sessions and caching. - Automatic caching and invalidation for Django models. - A slick ORM cache with automatic granular event-driven invalidation. - dogpile.cache is next generation replacement for Beaker made by same authors. - Python caching library with tag-based invalidation and dogpile effect prevention. - A Python wrapper around the libmemcached interface. - SQLite and file backed cache backend with faster lookups than memcached and redis.

        Libraries for chatbot development.

        Tools of static analysis, linters and code quality checkers. Also see awesome-static-analysis.

          Code Analysis
            - Language independent and easily extendable code analysis application. - Turn your Python and JavaScript code into DOT flowcharts. - A tool to analyse Python code. - A library that visualises the flow (call graph) of your Python application. - A tool for finding and analysing dead Python code.
            - A wrapper around pycodestyle , pyflakes and McCabe.
            - The uncompromising Python code formatter. - A Python utility / library to sort imports. - Yet another Python code formatter from Google.
            - Check variable types during compile time. - Performant type checking. - Collection of library stubs for Python, with static types.
            - A system for Python that generates static type annotations by collecting runtime types. - Auto-generate PEP-484 annotations. - Pytype checks and infers types for Python code - without requiring type annotations.

          Command-line Interface Development

          Libraries for building command-line applications.

            Command-line Application Development
              - CLI Application Framework for Python. - A package for creating beautiful command line interfaces in a composable way. - A framework for creating command-line programs with multi-level commands. - Pythonic command line arguments parser. - A library for creating command line interfaces from absolutely any Python object. - A library for building powerful interactive command lines.
              - A new kind of Progress Bar, with real-time throughput, eta and very cool animations. - A package to create full-screen text UIs (from interactive forms to ASCII animations). - Making basic plots in the terminal. - Cross-platform colored terminal text. - Python library for rich text and beautiful formatting in the terminal. Also provides a great RichHandler log handler. - Fast, extensible progress bar for loops and CLI.

            Useful CLI-based tools for productivity.

              Productivity Tools
                - A library and command-line utility for rendering projects templates. - A command-line utility that creates projects from cookiecutters (project templates). - A tool for live presentations in the terminal. - Instant coding answers via the command line. - A tool for managing shell-oriented subprocesses and organizing executable Python code into CLI-invokable tasks. - Select files out of bash output. - Adds flavor of interactive selection to the traditional pipe concept on UNIX. - Correcting your previous console command. - A tmux session manager. - A dead simple CLI to try out python packages - it's never been easier.
                - A command line HTTP client, a user-friendly cURL replacement. - Redis CLI with autocompletion and syntax highlighting. - An integrated shell for working with the Kubernetes CLI. - SQLite CLI with autocompletion and syntax highlighting. - MySQL CLI with autocompletion and syntax highlighting. - PostgreSQL CLI with autocompletion and syntax highlighting. - A Supercharged aws-cli.

              Libraries for migrating from Python 2 to 3.

                - The missing compatibility layer between Python 2 and Python 3. - Modernizes Python code for eventual Python 3 migration. - Python 2 and 3 compatibility utilities.

              Libraries for Computer Vision.

                - Ready-to-use OCR with 40+ languages supported. - Simple facial recognition library. - Open Source Differentiable Computer Vision Library for PyTorch. - Open Source Computer Vision Library. - A wrapper for Google Tesseract OCR. - An open source framework for building computer vision applications. - Another simple, Pillow-friendly, wrapper around the tesseract-ocr API for OCR.

              Concurrency and Parallelism

              Libraries for concurrent and parallel execution. Also see awesome-asyncio.

                - (Python standard library) A high-level interface for asynchronously executing callables. - Asynchronous framework with WSGI support. - A coroutine-based Python networking library that uses greenlet. - (Python standard library) Process-based parallelism. - Scalable Concurrent Operations in Python. - Ultra fast implementation of asyncio event loop on top of libuv .

              Libraries for storing and parsing configuration options.

                - INI file parser with validation. - (Python standard library) INI file parser. - Hydra is a framework for elegantly configuring complex applications. - Config from multiple formats with value conversion. - Strict separation of settings from code.
                - A package designed to expose cryptographic primitives and recipes to Python developers. - The leading native Python SSHv2 protocol library. - Secure password storage/hashing library, very high level. - Python binding to the Networking and Cryptography (NaCl) library.

              Libraries for data analyzing.

                - Pandas on AWS. - NumPy and Pandas interface to Big Data. - Business Intelligence (BI) in Pandas interface. - Agile Data Science Workflows made easy with PySpark. - Data mining, data visualization, analysis and machine learning through visual programming or scripts. - A library providing high-performance, easy-to-use data structures and data analysis tools.

              Libraries for validating data. Used for forms in many cases.

                - A lightweight and extensible data validation library. - Validating and deserializing data obtained via XML, JSON, an HTML form post. - An implementation of JSON Schema for Python. - A library for validating Python data structures. - Data Structure Validation. - Lightweight extensible data validation and adaptation library. - A Python data validation library.

              Libraries for visualizing data. Also see awesome-javascript.

                - Declarative statistical visualization library for Python. - Interactive Web Plotting for Python. - Interactive Plotting Library for the Jupyter Notebook - A cartographic python library with matplotlib support - Built on top of Flask, React and Plotly aimed at analytical web applications.

              Databases implemented in Python.

                - A simple and lightweight key-value store for Python. - A tiny, document-oriented database. - A native object database for Python. A key-value and object graph database.

              Libraries for connecting and operating databases.

                MySQL - awesome-mysql
                  - MySQL connector with Python 3 support (mysql-python fork). - A pure Python MySQL driver compatible to mysql-python.
                  - The most popular PostgreSQL adapter for Python. - A wrapper of the psycopg2 library for interacting with PostgreSQL.
                  - (Python standard library) SQlite interface compliant with DB-API 2.0 - A supercharged SQLite library built on top of apsw.
                  - A simple database interface to Microsoft SQL Server. - Python driver with native interface for ClickHouse.
                  - The Python Driver for Apache Cassandra. - A developer-friendly library for Apache HBase. - The Python client for Apache Kafka. - A client library and toolkit for working with Neo4j. - The official Python client for MongoDB. - The Python client for Redis.
                  - The async Python driver for MongoDB.

                Libraries for working with dates and times.

                  - A Python library that offers a sensible and human-friendly approach to creating, manipulating, formatting and converting dates, times and timestamps. - A Python 3 library for parsing human-written times and dates. - Extensions to the standard Python datetime module. - A library for clearing up the inconvenient truths that arise dealing with datetimes. - Datetimes for Humans. - A Python library for dealing with dates/times. Inspired by Moment.js. - Python datetimes made easy. - An easy-to-use Python module which aims to operate date/time/datetime by string. - World timezone definitions, modern and historical. Brings the tz database into Python. - Providing user-friendly functions to help perform common date and time actions.

                Libraries for debugging code.

                  pdb-like Debugger
                    - IPython-enabled pdb. - Another drop-in replacement for pdb. - A full-screen, console-based Python debugger. - An improbable web debugger through WebSockets.
                    - strace for Python programs. - Debugging UNIX socket connections and present the stacktraces for all threads and an interactive prompt. - Debugger capable of attaching to and injecting code into Python processes. - A flexible code tracing toolkit.
                    - Line-by-line profiling. - Monitor Memory usage of Python code. - A sampling profiler for Python programs. Written in Rust. - A ptracing profiler For Python. - Visual Python profiler.
                    - Display various debug information for Django. - A drop-in replacement for Django's runserver. - A port of the django-debug-toolbar to flask. - Inspect variables, expressions, and program execution with a single, simple function call. - Parsing and analyzing ELF files and DWARF debugging information.

                  Frameworks for Neural Networks and Deep Learning. Also see awesome-deep-learning.

                    - A fast open framework for deep learning.. - A high-level neural networks library and capable of running on top of either TensorFlow or Theano. - A deep learning framework designed for both efficiency and flexibility. - Tensors and Dynamic neural networks in Python with strong GPU acceleration. - Game agent framework. Use any video game as a deep learning sandbox. - The most popular Deep Learning framework created by Google. - A library for fast numerical computation.

                  Software and libraries for DevOps.

                    Configuration Management
                      - A radically simple IT automation platform. - A multi-distribution package that handles early initialization of a cloud instance. - Open source software for building private and public clouds. - A versatile CLI tools and python libraries to automate infrastructure. - Infrastructure automation and management system.
                      - Chef-like functionality for Fabric. - A simple, Pythonic tool for remote execution and deployment. - Tools for writing awesome Fabric files.
                      - A Python clone of Foreman, for managing Procfile-based applications. - Supervisor process control system for UNIX.
                      - A cross-platform process and system utilities module.
                      - A deduplicating archiver with compression and encryption.
                      - Fast, isolated development environments using Docker.

                    Frameworks and libraries for Distributed Computing.

                      Batch Processing
                        - A flexible parallel computing library for analytic computing. - A module that helps you build complex pipelines of batch jobs. - Run MapReduce jobs on Hadoop or Amazon Web Services. - Apache Spark Python API. - A system for parallel and distributed Python that unifies the machine learning ecosystem.
                        - A stream processing library, porting the ideas from Kafka Streams to Python. - Run Python code against real-time streams of data via Apache Storm.

                      Libraries to create packaged executables for release distribution.

                        - Build and distribute a virtualenv as a Debian package. - Compile scripts, modules, packages to an executable or extension module. - Freezes Python scripts (Mac OS X). - Freezes Python scripts (Windows). - A tool used to obfuscate python scripts, bind obfuscated scripts to fixed machine or expire obfuscated scripts. - Converts Python programs into stand-alone executables (cross-platform). - A tool to build Windows installers, installers bundle Python itself. - A command line utility for building fully self-contained zipapps (PEP 441), but with all their dependencies included.

                      Libraries for generating project documentation.

                      Libraries for downloading.

                        - A financial data interface library, built for human beings! - A command line tool for managing Amazon S3 and CloudFront. - Super S3 command line tool, good for higher performance. - A YouTube/Youku/Niconico video downloader written in Python 3. - A small command-line program to download videos from YouTube.

                      Frameworks and libraries for e-commerce and payments.

                        - Unofficial Alipay API for Python. - A shopping cart app built using the Mezzanine. - An open-source e-commerce framework for Django. - A Django based shop system. - Foreign exchange rates, Bitcoin price index and currency conversion. - A Django app to accept payments from various payment processors. - Money class with optional CLDR-backed locale-aware formatting and an extensible currency exchange. - Display money format and its filthy currencies. - An e-commerce storefront for Django. - An open source E-Commerce platform based on Django.
                        Emacs
                          - Emacs Python Development Environment.
                          - Anaconda turns your Sublime Text 3 in a full featured Python development IDE. - A Sublime Text plugin to the awesome auto-complete library Jedi.
                          - Vim bindings for the Jedi auto-completion library for Python. - An all in one plugin for turning Vim into a Python IDE. - Includes Jedi-based completion engine for Python.
                          - Python Tools for Visual Studio.
                          - The official VSCode extension with rich support for Python.
                          - Commercial Python IDE by JetBrains. Has free community edition available. - Open Source Python IDE.

                        Libraries for sending and parsing email.

                          Mail Servers
                            - A mail hosting and management platform including a modern Web UI. - A Python Mail Server.
                            - Python IMAP for Humans. - Yet another Gmail/SMTP client.
                            - An email address and Mime parsing library. - High-performance extensible mail delivery framework.

                          Enterprise Application Integrations

                          Platforms and tools for systems integrations in enterprise environments

                          Libraries for Python version and virtual environment management.

                          Libraries for file manipulation and MIME type detection.

                            - (Python standard library) Map filenames to MIME types. - A module wrapper for os.path. - (Python standard library) An cross-platform, object-oriented path library. - Python's filesystem abstraction layer. - A Python interface to the libmagic file type identification library. - An object-oriented approach to file/directory operations. - API and shell utilities to monitor file system events.

                          Foreign Function Interface

                          Libraries for providing foreign function interface.

                            - Foreign Function Interface for Python calling C code. - (Python standard library) Foreign Function Interface for Python calling C code. - A Python wrapper for Nvidia's CUDA API. - Simplified Wrapper and Interface Generator.

                          Libraries for working with forms.

                            - Python HTML form generation library influenced by the formish form generation library. - Bootstrap 3 integration with Django. - Bootstrap 4 integration with Django. - A Django app which lets you create beautiful forms in a very elegant and DRY way. - A platform independent Django form serializer. - A flexible forms validation and rendering library.

                          Functional Programming with Python.

                            - A variant of Python built for simple, elegant, Pythonic functional programming. - Cython implementation of Toolz : High performance functional utilities. - Functional programming in Python: implementation of missing features to enjoy FP. - A fancy and practical functional tools. - More routines for operating on iterables, beyond itertools . - A set of type-safe monads, transformers, and composition utilities. - A collection of functional utilities for iterators, functions, and dictionaries.

                          Libraries for working with graphical user interface applications.

                            - Built-in wrapper for ncurses used to create terminal GUI applications. - A library for making simple Electron-like offline HTML/JS GUI apps. - Creating beautiful user-interfaces with Declarative Syntax like QML. - Flexx is a pure Python toolkit for creating GUI's, that uses web technology for its rendering. - Turn command line programs into a full GUI application with one line. - A library for creating NUI applications, running on Windows, Linux, Mac OS X, Android and iOS. - A cross-platform windowing and multimedia library for Python. - Python Bindings for GLib/GObject/GIO/GTK+ (GTK+3). - Python bindings for the Qt cross-platform application and UI framework. - Wrapper for tkinter, Qt, WxPython and Remi. - A lightweight cross-platform native wrapper around a webview component. - Tkinter is Python's de-facto standard GUI package. - A Python native, OS native GUI toolkit. - A library for creating terminal GUI applications with strong support for widgets, events, rich colors, etc. - A blending of the wxWidgets C++ class library with the Python. - A Simple GPU accelerated Python GUI framework

                          Libraries for working with GraphQL.

                            - GraphQL framework for Python. - An aiohttp -based wrapper for Tartiflette to expose GraphQL APIs over HTTP. - ASGI support for the Tartiflette GraphQL engine. - SDL-first GraphQL engine implementation for Python 3.6+ and asyncio.

                          Awesome game development libraries.

                            - Arcade is a modern Python framework for crafting games with compelling graphics and sound. - cocos2d is a framework for building 2D games, demos, and other graphical/interactive applications. - Python framework for 3D, VR and game development. - 3D game engine developed by Disney. - Pygame is a set of Python modules designed for writing games. - Python bindings for the Ogre 3D render engine, can be used for games, simulations, anything 3D. - Python ctypes bindings for OpenGL and it's related APIs. - A ctypes based wrapper for the SDL2 library. - A Visual Novel engine.

                          Libraries for geocoding addresses and working with latitudes and longitudes.

                            - A Django app that provides a country field for models and forms. - A world-class geographic web framework. - Python API for MaxMind GeoIP Legacy Database. - Python bindings and utilities for GeoJSON. - Python Geocoding Toolbox.

                          Libraries for working with HTML and XML.

                            - Providing Pythonic idioms for iterating, searching, and modifying HTML or XML. - A whitelist-based HTML sanitization and text linkification library. - A CSS library for Python. - A standards-compliant library for parsing and serializing HTML documents and fragments. - A very fast, easy-to-use and versatile library for handling HTML and XML. - Implements a XML/HTML/XHTML Markup safe string for Python. - A jQuery-like library for parsing HTML. - Converts XML documents to Python objects for easy access. - A visual rendering engine for HTML and CSS that can export to PDF. - Simple XML Parsing. - Working with XML feel like you are working with JSON.

                          Libraries for working with HTTP.

                            - requests + gevent for asynchronous HTTP requests. - Comprehensive HTTP client library. - A next generation HTTP client for Python. - HTTP Requests for Humans. - Python requests like API built on top of Twisted's HTTP client. - A HTTP library with thread-safe connection pooling, file post support, sanity friendly.

                          Libraries for programming with hardware.

                            - Command line toolkit for working with Arduino. - Hook and simulate global keyboard events on Windows and Linux. - Hook and simulate global mouse events on Windows and Linux. - Pingo provides a uniform API to program devices like the Raspberry Pi, pcDuino, Intel Galileo, etc. - A module for cross-platform control of the mouse and keyboard. - A brilliant packet manipulation library.

                          Libraries for manipulating images.

                            - Image histogram remapping. - A project for searching a collection of images using visual similarity. - Nudity detection. - Retro identicon (Avatar) generation based on input string and hash. - Pillow is the friendly PIL fork. - Create barcodes in Python with no extra dependencies. - Instagram-like image filters. - A library for alpha matting. - A pure Python QR Code generator. - A tool that generates color schemes from images. - A fast image processing library with low memory needs. - Computer art based on quadtrees. - A Python library for (scientific) image processing. - A smart imaging service. It enables on-demand crop, re-sizing and flipping of images. - Python bindings for MagickWand, C API for ImageMagick.

                          Implementations of Python.

                            - Implementation of the Python programming language written in Common Lisp. - Default, most widely used implementation of the Python programming language written in C. - Optimizing Static Compiler for Python. - More compiler than interpreter as more powerful CPython2.7 replacement (alpha). - Implementation of the Python programming language written in C#. - Implementation of Python programming language written in Java for the JVM. - A lean and efficient Python programming language implementation. - Python JIT compiler to LLVM aimed at scientific Python. - x86-64 assembler embedded in Python. - A JIT for Python based upon CoreCLR. - A very fast and compliant implementation of the Python language. - A Python implementation using JIT techniques. - An enhanced version of the Python programming language.

                          Interactive Python interpreters (REPL).

                            - A fancy interface to the Python interpreter. - A rich toolkit to help you make the most out of using Python interactively.

                          Libraries for working with i18n.

                            - An internationalization library for Python. - A wrapper of International Components for Unicode C++ library (ICU).

                          Libraries for scheduling jobs.

                            - Airflow is a platform to programmatically author, schedule and monitor workflows. - A light but powerful in-process task scheduler that lets you schedule functions. - A calendaring app for Django. - A task runner and build tool. - Multipurpose task execution tool for distributed systems with web-based interface. - A set of tools to provide lightweight pipelining in Python. - Writing crontab file in Python like a charm. - A modern workflow orchestration framework that makes it easy to build, schedule and monitor robust data pipelines. - Python job scheduling for humans. - A powerful workflow engine implemented in pure Python. - A Python library that helps to make task execution easy, consistent and reliable.

                          Libraries for generating and working with logs.

                            - Logging replacement for Python. - (Python standard library) Logging facility for Python. - Library which aims to bring enjoyable logging in Python. - Sentry SDK for Python. - Structured logging made easy.

                          Libraries for Machine Learning. Also see awesome-machine-learning.

                            - A toolkit for developing and comparing reinforcement learning algorithms. - Open Source Fast Scalable Machine Learning Platform. - Machine learning evaluation metrics. - Numenta Platform for Intelligent Computing. - The most popular Python library for Machine Learning. - Apache Spark's scalable Machine Learning library. - A lightweight Python wrapper for Vowpal Wabbit. - A scalable, portable, and distributed gradient boosting library. - MindsDB is an open source AI layer for existing databases that allows you to effortlessly develop, train and deploy state-of-the-art machine learning models using standard queries.

                          Python programming on Microsoft Windows.

                            - Scientific-applications-oriented Python Distribution based on Qt and Spyder. - Unofficial Windows binaries for Python extension packages. - Python Integration with the .NET Common Language Runtime (CLR). - Python Extensions for Windows. - Portable development environment for Windows 7/8.

                          Useful libraries or tools that don't fit in the categories above.

                            - A fast Python in-process signal/event dispatching system. - A set of pure-Python utilities. - Various helpers to pass trusted data to untrusted environments. - A tool to generate music and art using artificial intelligence. - A simple but flexible plugin system for Python. - A general purpose business framework.

                          Natural Language Processing

                          Libraries for working with human languages.

                            General
                              - Topic Modeling for Humans. - Stand-alone language identification system. - A leading platform for building Python programs to work with human language data. - A web mining module. - Natural language pipeline supporting hundreds of languages. - A natural language modeling framework based on PyTorch. - A toolkit enabling rapid deep learning NLP prototyping for research. - A library for industrial-strength natural language processing in Python and Cython. - The Stanford NLP Group's official Python library, supporting 60+ languages.
                              - A collection of tools and datasets for Chinese NLP. - The most popular Chinese text segmentation library. - A toolkit for Chinese word segmentation in various domains. - A library for processing Chinese text.

                            Tools and libraries for Virtual Networking and SDN (Software Defined Networking).

                              - A popular network emulator and API written in Python. - Cross-vendor API to manipulate network devices. - A Python-based SDN control applications, such as OpenFlow SDN controllers.

                            Libraries for building user's activities.

                              - Generating generic activity streams from the actions on your site. - Building news feed and notification systems using Cassandra and Redis.

                            Libraries that implement Object-Relational Mapping or data mapping techniques.

                              Relational Databases
                                - The Django ORM. - The Python SQL Toolkit and Object Relational Mapper.
                                - Rich Python data types for Redis. - A Python Object-Document-Mapper for working with MongoDB. - A Pythonic interface for Amazon DynamoDB. - A Python Library for Simple Models and Containers Persisted in Redis.

                              Libraries for package and dependency management.

                              Local PyPI repository server and proxies.

                                - PyPI mirroring tool provided by Python Packaging Authority (PyPA). - PyPI server and packaging/testing/release tool. - Local PyPI server (custom packages and auto-mirroring of pypi). - Next generation Python Package Repository (PyPI).

                              Frameworks and tools for penetration testing.

                                - A Penetration testing framework. - A toolkit for social engineering. - Automatic SQL injection and database takeover tool.

                              Libraries that allow or deny users access to data or functionality.

                                - Implementation of per object permissions for Django 1.2+ - A tiny but powerful app providing object-level permissions to Django, without requiring a database.

                              Libraries for starting and communicating with OS processes.

                              Libraries for building recommender systems.

                                - Approximate Nearest Neighbors in C++/Python optimized for memory usage. - A library for Factorization Machines. - A fast Python implementation of collaborative filtering for implicit datasets. - A library for Field-aware Factorization Machine (FFM). - A Python implementation of a number of popular recommendation algorithms. - Deep recommender models using PyTorch. - A scikit for building and analyzing recommender systems. - A Recommendation Engine Framework in TensorFlow.

                              Refactoring tools and libraries for Python

                                - Bicycle Repair Man, a refactoring tool for Python. - Safe code refactoring for modern Python. - Rope is a python refactoring library.

                              Libraries for building RESTful APIs.

                                Django
                                  - A powerful and flexible toolkit to build web APIs. - Creating delicious APIs for Django apps.
                                  - REST API framework powered by Flask, MongoDB and good intentions. - Browsable Web APIs for Flask. - Quickly building REST APIs for Flask.
                                  - A RESTful framework for Pyramid.
                                  - A smart Web API framework, designed for Python 3. - A high-performance framework for building cloud APIs and web app backends. - A modern, fast, web framework for building APIs with Python 3.6+ based on standard Python type hints. - A Python 3 framework for cleanly exposing APIs. - Automated REST APIs for existing database-driven systems. - A Python 3.6+ web server and web framework that's written to go fast. - Fast, efficient and asynchronous Web framework inspired by Flask.
                                  - This is a compilation of various robotics algorithms with visualizations. - This is a library for ROS (Robot Operating System).
                                  (Remote Python Call) - A transparent and symmetric RPC library for Python - zerorpc is a flexible RPC implementation based on ZeroMQ and MessagePack.

                                Libraries for scientific computing. Also see Python-for-Scientists.

                                  - A community Python library for Astronomy. - Providing best-practice pipelines for fully automated high throughput sequencing analysis. - Collection of useful code related to biological analysis. - Biopython is a set of freely available tools for biological computation. - A library for parsing and interpreting the results of computational chemistry packages. - Implementing a comprehensive number of colour theory transformations and algorithms. - Unsupervised machine learning toolbox for graph structured data. - A high-productivity software for complex networks. - A collection of neuroimaging toolkits. - A fundamental package for scientific computing with Python. - A Python toolbox for seismology. - A chemical toolbox designed to speak the many languages of chemical data. - Short for Python Dynamics, used to assist with workflow in the modeling of dynamic motion. - Markov Chain Monte Carlo sampling toolkit. - Quantum Toolbox in Python. - Cheminformatics and Machine Learning Software. - A Python-based ecosystem of open-source software for mathematics, science, and engineering. - A process-based discrete-event simulation framework. - Statistical modeling and econometrics in Python. - A Python library for symbolic mathematics. - A Pythonic algorithmic trading library.

                                Libraries and software for indexing and performing search queries on data.

                                  - Modular search for Django. - The official high-level Python client for Elasticsearch. - The official low-level Python client for Elasticsearch. - A lightweight Python wrapper for Apache Solr. - A fast, pure Python search engine library.

                                Libraries for serializing complex data types

                                  - A lightweight library for converting complex objects to and from simple Python datatypes. - A Python bindings for simdjson. - A Python wrapper around RapidJSON. - A fast JSON decoder and encoder written in C with Python bindings.

                                Frameworks for developing serverless Python code.

                                  - A toolkit for developing and deploying Python code in AWS Lambda. - A tool for deploying WSGI applications on AWS Lambda and API Gateway.

                                Specific Formats Processing

                                Libraries for parsing and manipulating specific text formats.

                                  General
                                    - A module for Tabular Datasets in XLS, CSV, JSON, YAML.
                                    - Editing a docx document by jinja2 template - A library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files. - Providing one API for reading, manipulating and writing csv, ods, xls, xlsx and xlsm files. - Reads, queries and modifies Microsoft Word 2007/2008 docx files. - Python library for creating and updating PowerPoint (.pptx) files. - Convert between any document format supported by LibreOffice/OpenOffice. - A Python module for creating Excel .xlsx files. - A BSD-licensed library that makes it easy to call Python from Excel and vice versa. / xlrd - Writing and reading data and formatting information from Excel files.
                                    - A tool for extracting information from PDF documents. - A library capable of splitting, merging and transforming PDF pages. - Allowing Rapid creation of rich PDF documents.
                                    - Fastest and full featured pure Python parsers of Markdown. - A Python implementation of John Gruber’s Markdown.
                                    - YAML implementations for Python.
                                    - Utilities for converting to and working with CSV.
                                    - A command line tool that can unpack archives easily.

                                  Static site generator is a software that takes some text + templates as input and produces HTML files on the output.

                                    - An easy to use static CMS and blog engine. - Markdown friendly documentation generator. - Simple, lightweight, and magic-free static site/blog generator (< 130 lines). - A static website and blog generator. - Static site generator that supports Markdown and reST syntax.

                                  Libraries for tagging items.

                                  Libraries for working with task queues.

                                    - An asynchronous task queue/job queue based on distributed message passing. - A fast and reliable background task processing library for Python 3. - Little multi-threaded task queue. - A distributed worker task queue in Python using Redis & gevent. - Simple job queues for Python.

                                  Libraries and tools for templating and lexing.

                                    - Python templating toolkit for generation of web-aware output. - A modern and designer friendly templating language. - Hyperfast and lightweight templating for the Python platform.

                                  Libraries for testing codebases and generating test data.

                                    Testing Frameworks
                                      - Hypothesis is an advanced Quickcheck style property based testing library. - The successor to nose , based on `unittest2. - A mature full-featured Python testing tool. - A generic test automation framework. - (Python standard library) Unit testing framework.
                                      - A clean, colorful test runner. - The definitive testing tool for Python. Born under the banner of BDD. - Auto builds and tests distributions in multiple Python versions
                                      - Scalable user load testing tool written in Python. - PyAutoGUI is a cross-platform GUI automation Python module for human beings. - A tool for automatic property-based testing of web applications built with Open API / Swagger specifications. - Python bindings for Selenium WebDriver. - A language-agnostic A/B Testing framework. - Open source tool for testing web applications.
                                      - Powerful test doubles framework for Python. - Travel through time by mocking the datetime module. - A mocking library for requests for Python 2.6+ and 3.2+. - HTTP request mock tool for Python. - (Python standard library) A mocking and patching library. - A socket mock framework with gevent/asyncio/SSL support. - A utility library for mocking out the requests Python library. - Record and replay HTTP interactions on your tests.
                                      - A test fixtures replacement for Python. - Another fixtures replacement. Supports Django, Flask, SQLAlchemy, Peewee and etc. - Creating random fixtures for testing in Django.
                                      - Code coverage measurement.
                                      - Fake database generator. - A Python package that generates fake data. - is a Python library that help you generate fake data. - Generate random datetime / time.

                                    Libraries for parsing and manipulating plain texts.

                                      General
                                        - Python 2/3 compatible character encoding detector. - (Python standard library) Helpers for computing deltas. - Makes Unicode text less broken and more consistent automagically. - Fuzzy String Matching. - Fast computation of Levenshtein distance and string similarity. - Paranoid text spacing. - An implementation of figlet written in Python. - Convert Chinese hanzi (漢字) to pinyin (拼音). - Compute distance between sequences with 30+ algorithms. - ASCII transliterations of Unicode text.
                                        - A Python slugify library that can preserve unicode. - A Python slugify library that translates unicode to ASCII. - A slugifier that generates unicode slugs with Django as a dependency.
                                        - Implementation of hashids in Python. - A generator library for concise, unambiguous and URL-safe UUIDs.
                                        - Implementation of lex and yacc parsing tools for Python. - A generic syntax highlighter. - A general purpose framework for generating parsers. - Parsing human names into their individual components. - Parsing, formatting, storing and validating international phone numbers. - Browser user agent parser. - A non-validating SQL parser.

                                      Libraries for accessing third party services APIs. Also see List of Python API Wrappers and Libraries.