I"m trying to make a basic Windows application that builds a string out of user input and then adds it to the clipboard. How do I copy a string to the clipboard using Python?

I just need a python script that copies text to the clipboard.

After the script gets executed i need the output of the text to be pasted to another source. Is it possible to write a python script that does this job?

How do I read text from the (windows) clipboard with python?

I am very new to PyDev and Python, though I have used Eclipse for Java plenty. I am trying to work through some of the Dive Into Python examples and this feels like an extremely trivial problem that"s just becoming exceedingly annoying. I am using Ubuntu Linux 10.04.

I want to be able to use the file odbchelper.py, which is located in the directory `/Desktop/Python_Tutorials/diveintopython/py`

Here is my example.py file that I"m working on in my PyDev/Eclipse project:

```
import sys
sys.path.append("~/Desktop/Python_Tutorials/diveintopython/py")
```

This works fine, but then I want the next line of my code to be:

```
import odbchelper
```

and this causes an unresolved import error every time. I have added `__init__.py`

files to just about every directory possible and it doesn"t help anything. I"ve tried adding `__init__.py`

files one at a time to the various levels of directories between the project location and the odbchelper.py file, and I"ve also tried adding the `__init__.py`

files to all of the directories in between simultaneously. Neither works.

All I want to do is have a project somewhere in some other directory, say `/Desktop/MyStuff/Project`

, in which I have example.py ... and then from example.py I want to import odbchelper.py from `/Desktop/Python_Tutorials/diveintopython/py/`

Every message board response I can find just saying to use the `sys.path.append()`

function to add this directory to my path, and then import it ... but that is precisely what I am doing in my code and it"s not working.

I have also tried the `Ctrl`-`1` trick to suppress the error message, but the program is still not functioning correctly. I get an error, `ImportError: No module named odbchelper`

. So it"s clearly not getting the path added, or there is some problem that all of my many permutations of adding `__init__.py`

files has missed.

It"s very frustrating that something this simple... calling things from some file that exists somewhere else on my machine... requires this much effort.

Considering the example code.

I would like to know How to apply gradient clipping on this network on the RNN where there is a possibility of exploding gradients.

```
tf.clip_by_value(t, clip_value_min, clip_value_max, name=None)
```

This is an example that could be used but where do I introduce this ? In the def of RNN

```
lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Split data because rnn cell needs a list of inputs for the RNN inner loop
_X = tf.split(0, n_steps, _X) # n_steps
tf.clip_by_value(_X, -1, 1, name=None)
```

But this doesn"t make sense as the tensor _X is the input and not the grad what is to be clipped?

Do I have to define my own Optimizer for this or is there a simpler option?

This post aims to give readers a primer on SQL-flavored merging with Pandas, how to use it, and when not to use it.

In particular, here"s what this post will go through:

The basics - types of joins (LEFT, RIGHT, OUTER, INNER)

- merging with different column names
- merging with multiple columns
- avoiding duplicate merge key column in output

What this post (and other posts by me on this thread) will not go through:

- Performance-related discussions and timings (for now). Mostly notable mentions of better alternatives, wherever appropriate.
- Handling suffixes, removing extra columns, renaming outputs, and other specific use cases. There are other (read: better) posts that deal with that, so figure it out!

NoteMost examples default to INNER JOIN operations while demonstrating various features, unless otherwise specified.Furthermore, all the DataFrames here can be copied and replicated so you can play with them. Also, see this post on how to read DataFrames from your clipboard.

Lastly, all visual representation of JOIN operations have been hand-drawn using Google Drawings. Inspiration from here.

`merge`

!```
np.random.seed(0)
left = pd.DataFrame({"key": ["A", "B", "C", "D"], "value": np.random.randn(4)})
right = pd.DataFrame({"key": ["B", "D", "E", "F"], "value": np.random.randn(4)})
left
key value
0 A 1.764052
1 B 0.400157
2 C 0.978738
3 D 2.240893
right
key value
0 B 1.867558
1 D -0.977278
2 E 0.950088
3 F -0.151357
```

For the sake of simplicity, the key column has the same name (for now).

An **INNER JOIN** is represented by

NoteThis, along with the forthcoming figures all follow this convention:

blueindicates rows that are present in the merge resultredindicates rows that are excluded from the result (i.e., removed)greenindicates missing values that are replaced with`NaN`

s in the result

To perform an INNER JOIN, call `merge`

on the left DataFrame, specifying the right DataFrame and the join key (at the very least) as arguments.

```
left.merge(right, on="key")
# Or, if you want to be explicit
# left.merge(right, on="key", how="inner")
key value_x value_y
0 B 0.400157 1.867558
1 D 2.240893 -0.977278
```

This returns only rows from `left`

and `right`

which share a common key (in this example, "B" and "D).

A **LEFT OUTER JOIN**, or LEFT JOIN is represented by

This can be performed by specifying `how="left"`

.

```
left.merge(right, on="key", how="left")
key value_x value_y
0 A 1.764052 NaN
1 B 0.400157 1.867558
2 C 0.978738 NaN
3 D 2.240893 -0.977278
```

Carefully note the placement of NaNs here. If you specify `how="left"`

, then only keys from `left`

are used, and missing data from `right`

is replaced by NaN.

And similarly, for a **RIGHT OUTER JOIN**, or RIGHT JOIN which is...

...specify `how="right"`

:

```
left.merge(right, on="key", how="right")
key value_x value_y
0 B 0.400157 1.867558
1 D 2.240893 -0.977278
2 E NaN 0.950088
3 F NaN -0.151357
```

Here, keys from `right`

are used, and missing data from `left`

is replaced by NaN.

Finally, for the **FULL OUTER JOIN**, given by

specify `how="outer"`

.

```
left.merge(right, on="key", how="outer")
key value_x value_y
0 A 1.764052 NaN
1 B 0.400157 1.867558
2 C 0.978738 NaN
3 D 2.240893 -0.977278
4 E NaN 0.950088
5 F NaN -0.151357
```

This uses the keys from both frames, and NaNs are inserted for missing rows in both.

The documentation summarizes these various merges nicely:

If you need **LEFT-Excluding JOINs** and **RIGHT-Excluding JOINs** in two steps.

For LEFT-Excluding JOIN, represented as

Start by performing a LEFT OUTER JOIN and then filtering (excluding!) rows coming from `left`

only,

```
(left.merge(right, on="key", how="left", indicator=True)
.query("_merge == "left_only"")
.drop("_merge", 1))
key value_x value_y
0 A 1.764052 NaN
2 C 0.978738 NaN
```

Where,

`left.merge(right, on="key", how="left", `**indicator=True**)
key value_x value_y _merge
0 A 1.764052 NaN left_only
1 B 0.400157 1.867558 both
2 C 0.978738 NaN left_only
3 D 2.240893 -0.977278 both

And similarly, for a RIGHT-Excluding JOIN,

`(left.merge(right, on="key", how="right", `**indicator=True**)
.query("_merge == "right_only"")
.drop("_merge", 1))
key value_x value_y
2 E NaN 0.950088
3 F NaN -0.151357

Lastly, if you are required to do a merge that only retains keys from the left or right, but not both (IOW, performing an **ANTI-JOIN**),

You can do this in similar fashion‚Äî

```
(left.merge(right, on="key", how="outer", indicator=True)
.query("_merge != "both"")
.drop("_merge", 1))
key value_x value_y
0 A 1.764052 NaN
2 C 0.978738 NaN
4 E NaN 0.950088
5 F NaN -0.151357
```

If the key columns are named differently‚Äîfor example, `left`

has `keyLeft`

, and `right`

has `keyRight`

instead of `key`

‚Äîthen you will have to specify `left_on`

and `right_on`

as arguments instead of `on`

:

```
left2 = left.rename({"key":"keyLeft"}, axis=1)
right2 = right.rename({"key":"keyRight"}, axis=1)
left2
keyLeft value
0 A 1.764052
1 B 0.400157
2 C 0.978738
3 D 2.240893
right2
keyRight value
0 B 1.867558
1 D -0.977278
2 E 0.950088
3 F -0.151357
```

```
left2.merge(right2, left_on="keyLeft", right_on="keyRight", how="inner")
keyLeft value_x keyRight value_y
0 B 0.400157 B 1.867558
1 D 2.240893 D -0.977278
```

When merging on `keyLeft`

from `left`

and `keyRight`

from `right`

, if you only want either of the `keyLeft`

or `keyRight`

(but not both) in the output, you can start by setting the index as a preliminary step.

```
left3 = left2.set_index("keyLeft")
left3.merge(right2, left_index=True, right_on="keyRight")
value_x keyRight value_y
0 0.400157 B 1.867558
1 2.240893 D -0.977278
```

Contrast this with the output of the command just before (that is, the output of `left2.merge(right2, left_on="keyLeft", right_on="keyRight", how="inner")`

), you"ll notice `keyLeft`

is missing. You can figure out what column to keep based on which frame"s index is set as the key. This may matter when, say, performing some OUTER JOIN operation.

`DataFrames`

For example, consider

```
right3 = right.assign(newcol=np.arange(len(right)))
right3
key value newcol
0 B 1.867558 0
1 D -0.977278 1
2 E 0.950088 2
3 F -0.151357 3
```

If you are required to merge only "new_val" (without any of the other columns), you can usually just subset columns before merging:

```
left.merge(right3[["key", "newcol"]], on="key")
key value newcol
0 B 0.400157 0
1 D 2.240893 1
```

If you"re doing a LEFT OUTER JOIN, a more performant solution would involve `map`

:

```
# left["newcol"] = left["key"].map(right3.set_index("key")["newcol"]))
left.assign(newcol=left["key"].map(right3.set_index("key")["newcol"]))
key value newcol
0 A 1.764052 NaN
1 B 0.400157 0.0
2 C 0.978738 NaN
3 D 2.240893 1.0
```

As mentioned, this is similar to, but faster than

```
left.merge(right3[["key", "newcol"]], on="key", how="left")
key value newcol
0 A 1.764052 NaN
1 B 0.400157 0.0
2 C 0.978738 NaN
3 D 2.240893 1.0
```

To join on more than one column, specify a list for `on`

(or `left_on`

and `right_on`

, as appropriate).

```
left.merge(right, on=["key1", "key2"] ...)
```

Or, in the event the names are different,

```
left.merge(right, left_on=["lkey1", "lkey2"], right_on=["rkey1", "rkey2"])
```

`merge*`

operations and functionsMerging a DataFrame with Series on index: See this answer.

Besides

`merge`

,`DataFrame.update`

and`DataFrame.combine_first`

are also used in certain cases to update one DataFrame with another.`pd.merge_ordered`

is a useful function for ordered JOINs.`pd.merge_asof`

(read: merge_asOf) is useful for*approximate*joins.

**This section only covers the very basics, and is designed to only whet your appetite. For more examples and cases, see the documentation on merge, join, and concat as well as the links to the function specifications.**

Jump to other topics in Pandas Merging 101 to continue learning:

_{*You are here.}

You can adjust the subplot geometry in the very `tight_layout`

call as follows:

```
fig.tight_layout(rect=[0, 0.03, 1, 0.95])
```

As it"s stated in the documentation (https://matplotlib.org/users/tight_layout_guide.html):

`tight_layout()`

only considers ticklabels, axis labels, and titles. Thus, other artists may be clipped and also may overlap.

Disclaimer: I"m mostly writing this post with syntactical considerations and general behaviour in mind. I"m not familiar with the memory and CPU aspect of the methods described, and I aim this answer at those who have reasonably small sets of data, such that the quality of the interpolation can be the main aspect to consider. I am aware that when working with very large data sets, the better-performing methods (namely `griddata`

and `RBFInterpolator`

without a `neighbors`

keyword argument) might not be feasible.

Note that this answer uses the new `RBFInterpolator`

class introduced in `SciPy`

1.7.0. For the legacy `Rbf`

class see the previous version of this answer.

I"m going to compare three kinds of multi-dimensional interpolation methods (`interp2d`

/splines, `griddata`

and `RBFInterpolator`

). I will subject them to two kinds of interpolation tasks and two kinds of underlying functions (points from which are to be interpolated). The specific examples will demonstrate two-dimensional interpolation, but the viable methods are applicable in arbitrary dimensions. Each method provides various kinds of interpolation; in all cases I will use cubic interpolation (or something close^{1}). It"s important to note that whenever you use interpolation you introduce bias compared to your raw data, and the specific methods used affect the artifacts that you will end up with. Always be aware of this, and interpolate responsibly.

The two interpolation tasks will be

- upsampling (input data is on a rectangular grid, output data is on a denser grid)
- interpolation of scattered data onto a regular grid

The two functions (over the domain `[x, y] in [-1, 1]x[-1, 1]`

) will be

- a smooth and friendly function:
`cos(pi*x)*sin(pi*y)`

; range in`[-1, 1]`

- an evil (and in particular, non-continuous) function:
`x*y / (x^2 + y^2)`

with a value of 0.5 near the origin; range in`[-0.5, 0.5]`

Here"s how they look:

I will first demonstrate how the three methods behave under these four tests, then I"ll detail the syntax of all three. If you know what you should expect from a method, you might not want to waste your time learning its syntax (looking at you, `interp2d`

).

For the sake of explicitness, here is the code with which I generated the input data. While in this specific case I"m obviously aware of the function underlying the data, I will only use this to generate input for the interpolation methods. I use numpy for convenience (and mostly for generating the data), but scipy alone would suffice too.

```
import numpy as np
import scipy.interpolate as interp
# auxiliary function for mesh generation
def gimme_mesh(n):
minval = -1
maxval = 1
# produce an asymmetric shape in order to catch issues with transpositions
return np.meshgrid(np.linspace(minval, maxval, n),
np.linspace(minval, maxval, n + 1))
# set up underlying test functions, vectorized
def fun_smooth(x, y):
return np.cos(np.pi*x) * np.sin(np.pi*y)
def fun_evil(x, y):
# watch out for singular origin; function has no unique limit there
return np.where(x**2 + y**2 > 1e-10, x*y/(x**2+y**2), 0.5)
# sparse input mesh, 6x7 in shape
N_sparse = 6
x_sparse, y_sparse = gimme_mesh(N_sparse)
z_sparse_smooth = fun_smooth(x_sparse, y_sparse)
z_sparse_evil = fun_evil(x_sparse, y_sparse)
# scattered input points, 10^2 altogether (shape (100,))
N_scattered = 10
rng = np.random.default_rng()
x_scattered, y_scattered = rng.random((2, N_scattered**2))*2 - 1
z_scattered_smooth = fun_smooth(x_scattered, y_scattered)
z_scattered_evil = fun_evil(x_scattered, y_scattered)
# dense output mesh, 20x21 in shape
N_dense = 20
x_dense, y_dense = gimme_mesh(N_dense)
```

Let"s start with the easiest task. Here"s how an upsampling from a mesh of shape `[6, 7]`

to one of `[20, 21]`

works out for the smooth test function:

Even though this is a simple task, there are already subtle differences between the outputs. At a first glance all three outputs are reasonable. There are two features to note, based on our prior knowledge of the underlying function: the middle case of `griddata`

distorts the data most. Note the `y == -1`

boundary of the plot (nearest the `x`

label): the function should be strictly zero (since `y == -1`

is a nodal line for the smooth function), yet this is not the case for `griddata`

. Also note the `x == -1`

boundary of the plots (behind, to the left): the underlying function has a local maximum (implying zero gradient near the boundary) at `[-1, -0.5]`

, yet the `griddata`

output shows clearly non-zero gradient in this region. The effect is subtle, but it"s a bias none the less.

A bit harder task is to perform upsampling on our evil function:

Clear differences are starting to show among the three methods. Looking at the surface plots, there are clear spurious extrema appearing in the output from `interp2d`

(note the two humps on the right side of the plotted surface). While `griddata`

and `RBFInterpolator`

seem to produce similar results at first glance, producing local minima near `[0.4, -0.4]`

that is absent from the underlying function.

However, there is one crucial aspect in which `RBFInterpolator`

is far superior: it respects the symmetry of the underlying function (which is of course also made possible by the symmetry of the sample mesh). The output from `griddata`

breaks the symmetry of the sample points, which is already weakly visible in the smooth case.

Most often one wants to perform interpolation on scattered data. For this reason I expect these tests to be more important. As shown above, the sample points were chosen pseudo-uniformly in the domain of interest. In realistic scenarios you might have additional noise with each measurement, and you should consider whether it makes sense to interpolate your raw data to begin with.

Output for the smooth function:

Now there"s already a bit of a horror show going on. I clipped the output from `interp2d`

to between `[-1, 1]`

exclusively for plotting, in order to preserve at least a minimal amount of information. It"s clear that while some of the underlying shape is present, there are huge noisy regions where the method completely breaks down. The second case of `griddata`

reproduces the shape fairly nicely, but note the white regions at the border of the contour plot. This is due to the fact that `griddata`

only works inside the convex hull of the input data points (in other words, it doesn"t perform any *extrapolation*). I kept the default NaN value for output points lying outside the convex hull.^{2} Considering these features, `RBFInterpolator`

seems to perform best.

And the moment we"ve all been waiting for:

It"s no huge surprise that `interp2d`

gives up. In fact, during the call to `interp2d`

you should expect some friendly `RuntimeWarning`

s complaining about the impossibility of the spline to be constructed. As for the other two methods, `RBFInterpolator`

seems to produce the best output, even near the borders of the domain where the result is extrapolated.

So let me say a few words about the three methods, in decreasing order of preference (so that the worst is the least likely to be read by anybody).

`scipy.interpolate.RBFInterpolator`

The RBF in the name of the `RBFInterpolator`

class stands for "radial basis functions". To be honest I"ve never considered this approach until I started researching for this post, but I"m pretty sure I"ll be using these in the future.

Just like the spline-based methods (see later), usage comes in two steps: first one creates a callable `RBFInterpolator`

class instance based on the input data, and then calls this object for a given output mesh to obtain the interpolated result. Example from the smooth upsampling test:

```
import scipy.interpolate as interp
sparse_points = np.stack([x_sparse.ravel(), y_sparse.ravel()], -1) # shape (N, 2) in 2d
dense_points = np.stack([x_dense.ravel(), y_dense.ravel()], -1) # shape (N, 2) in 2d
zfun_smooth_rbf = interp.RBFInterpolator(sparse_points, z_sparse_smooth.ravel(),
smoothing=0, kernel="cubic") # explicit default smoothing=0 for interpolation
z_dense_smooth_rbf = zfun_smooth_rbf(dense_points).reshape(x_dense.shape) # not really a function, but a callable class instance
zfun_evil_rbf = interp.RBFInterpolator(sparse_points, z_sparse_evil.ravel(),
smoothing=0, kernel="cubic") # explicit default smoothing=0 for interpolation
z_dense_evil_rbf = zfun_evil_rbf(dense_points).reshape(x_dense.shape) # not really a function, but a callable class instance
```

Note that we had to do some array building gymnastics to make the API of `RBFInterpolator`

happy. Since we have to pass the 2d points as arrays of shape `(N, 2)`

, we have to flatten the input grid and stack the two flattened arrays. The constructed interpolator also expects query points in this format, and the result will be a 1d array of shape `(N,)`

which we have to reshape back to match our 2d grid for plotting. Since `RBFInterpolator`

makes no assumptions about the number of dimensions of the input points, it supports arbitrary dimensions for interpolation.

So, `scipy.interpolate.RBFInterpolator`

- produces well-behaved output even for crazy input data
- supports interpolation in higher dimensions
- extrapolates outside the convex hull of the input points (of course extrapolation is always a gamble, and you should generally not rely on it at all)
- creates an interpolator as a first step, so evaluating it in various output points is less additional effort
- can have output point arrays of arbitrary shape (as opposed to being constrained to rectangular meshes, see later)
- more likely to preserving the symmetry of the input data
- supports multiple kinds of radial functions for keyword
`kernel`

:`multiquadric`

,`inverse_multiquadric`

,`inverse_quadratic`

,`gaussian`

,`linear`

,`cubic`

,`quintic`

,`thin_plate_spline`

(the default). As of SciPy 1.7.0 the class doesn"t allow passing a custom callable due to technical reasons, but this is likely to be added in a future version. - can give inexact interpolations by increasing the
`smoothing`

parameter

One drawback of RBF interpolation is that interpolating `N`

data points involves inverting an `N x N`

matrix. This quadratic complexity very quickly blows up memory need for a large number of data points. However, the new `RBFInterpolator`

class also supports a `neighbors`

keyword parameter that restricts computation of each radial basis function to `k`

nearest neighbours, thereby reducing memory need.

`scipy.interpolate.griddata`

My former favourite, `griddata`

, is a general workhorse for interpolation in arbitrary dimensions. It doesn"t perform extrapolation beyond setting a single preset value for points outside the convex hull of the nodal points, but since extrapolation is a very fickle and dangerous thing, this is not necessarily a con. Usage example:

```
sparse_points = np.stack([x_sparse.ravel(), y_sparse.ravel()], -1) # shape (N, 2) in 2d
z_dense_smooth_griddata = interp.griddata(sparse_points, z_sparse_smooth.ravel(),
(x_dense, y_dense), method="cubic") # default method is linear
```

Note that the same array transformations were necessary for the input arrays as for `RBFInterpolator`

. The input points have to be specified in an array of shape `[N, D]`

in `D`

dimensions, or alternatively as a tuple of 1d arrays:

```
z_dense_smooth_griddata = interp.griddata((x_sparse.ravel(), y_sparse.ravel()),
z_sparse_smooth.ravel(), (x_dense, y_dense), method="cubic")
```

The output point arrays can be specified as a tuple of arrays of arbitrary dimensions (as in both above snippets), which gives us some more flexibility.

In a nutshell, `scipy.interpolate.griddata`

- produces well-behaved output even for crazy input data
- supports interpolation in higher dimensions
- does not perform extrapolation, a single value can be set for the output outside the convex hull of the input points (see
`fill_value`

) - computes the interpolated values in a single call, so probing multiple sets of output points starts from scratch
- can have output points of arbitrary shape
- supports nearest-neighbour and linear interpolation in arbitrary dimensions, cubic in 1d and 2d. Nearest-neighbour and linear interpolation use
`NearestNDInterpolator`

and`LinearNDInterpolator`

under the hood, respectively. 1d cubic interpolation uses a spline, 2d cubic interpolation uses`CloughTocher2DInterpolator`

to construct a continuously differentiable piecewise-cubic interpolator. - might violate the symmetry of the input data

`scipy.interpolate.interp2d`

/`scipy.interpolate.bisplrep`

The only reason I"m discussing `interp2d`

and its relatives is that it has a deceptive name, and people are likely to try using it. Spoiler alert: don"t use it (as of scipy version 1.7.0). It"s already more special than the previous subjects in that it"s specifically used for two-dimensional interpolation, but I suspect this is by far the most common case for multivariate interpolation.

As far as syntax goes, `interp2d`

is similar to `RBFInterpolator`

in that it first needs constructing an interpolation instance, which can be called to provide the actual interpolated values. There"s a catch, however: the output points have to be located on a rectangular mesh, so inputs going into the call to the interpolator have to be 1d vectors which span the output grid, as if from `numpy.meshgrid`

:

```
# reminder: x_sparse and y_sparse are of shape [6, 7] from numpy.meshgrid
zfun_smooth_interp2d = interp.interp2d(x_sparse, y_sparse, z_sparse_smooth, kind="cubic") # default kind is "linear"
# reminder: x_dense and y_dense are of shape (20, 21) from numpy.meshgrid
xvec = x_dense[0,:] # 1d array of unique x values, 20 elements
yvec = y_dense[:,0] # 1d array of unique y values, 21 elements
z_dense_smooth_interp2d = zfun_smooth_interp2d(xvec, yvec) # output is (20, 21)-shaped array
```

One of the most common mistakes when using `interp2d`

is putting your full 2d meshes into the interpolation call, which leads to explosive memory consumption, and hopefully to a hasty `MemoryError`

.

Now, the greatest problem with `interp2d`

is that it often doesn"t work. In order to understand this, we have to look under the hood. It turns out that `interp2d`

is a wrapper for the lower-level functions `bisplrep`

+ `bisplev`

, which are in turn wrappers for FITPACK routines (written in Fortran). The equivalent call to the previous example would be

```
kind = "cubic"
if kind == "linear":
kx = ky = 1
elif kind == "cubic":
kx = ky = 3
elif kind == "quintic":
kx = ky = 5
# bisplrep constructs a spline representation, bisplev evaluates the spline at given points
bisp_smooth = interp.bisplrep(x_sparse.ravel(), y_sparse.ravel(),
z_sparse_smooth.ravel(), kx=kx, ky=ky, s=0)
z_dense_smooth_bisplrep = interp.bisplev(xvec, yvec, bisp_smooth).T # note the transpose
```

Now, here"s the thing about `interp2d`

: (in scipy version 1.7.0) there is a nice comment in `interpolate/interpolate.py`

for `interp2d`

:

```
if not rectangular_grid:
# TODO: surfit is really not meant for interpolation!
self.tck = fitpack.bisplrep(x, y, z, kx=kx, ky=ky, s=0.0)
```

and indeed in `interpolate/fitpack.py`

, in `bisplrep`

there"s some setup and ultimately

```
tx, ty, c, o = _fitpack._surfit(x, y, z, w, xb, xe, yb, ye, kx, ky,
task, s, eps, tx, ty, nxest, nyest,
wrk, lwrk1, lwrk2)
```

And that"s it. The routines underlying `interp2d`

are not really meant to perform interpolation. They might suffice for sufficiently well-behaved data, but under realistic circumstances you will probably want to use something else.

Just to conclude, `interpolate.interp2d`

- can lead to artifacts even with well-tempered data
- is specifically for bivariate problems (although there"s the limited
`interpn`

for input points defined on a grid) - performs extrapolation
- creates an interpolator as a first step, so evaluating it in various output points is less additional effort
- can only produce output over a rectangular grid, for scattered output you would have to call the interpolator in a loop
- supports linear, cubic and quintic interpolation
- might violate the symmetry of the input data

^{1}I"m fairly certain that the `cubic`

and `linear`

kind of basis functions of `RBFInterpolator`

do not exactly correspond to the other interpolators of the same name.

^{2}These NaNs are also the reason for why the surface plot seems so odd: matplotlib historically has difficulties with plotting complex 3d objects with proper depth information. The NaN values in the data confuse the renderer, so parts of the surface that should be in the back are plotted to be in the front. This is an issue with visualization, and not interpolation.

Gradient clipping needs to happen after computing the gradients, but before applying them to update the model"s parameters. In your example, both of those things are handled by the `AdamOptimizer.minimize()`

method.

In order to clip your gradients you"ll need to explicitly compute, clip, and apply them as described in this section in TensorFlow"s API documentation. Specifically you"ll need to substitute the call to the `minimize()`

method with something like the following:

```
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
gvs = optimizer.compute_gradients(cost)
capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
train_op = optimizer.apply_gradients(capped_gvs)
```

Despite what seems to be popular, you probably want to clip the whole gradient by its global norm:

```
optimizer = tf.train.AdamOptimizer(1e-3)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
optimize = optimizer.apply_gradients(zip(gradients, variables))
```

Clipping each gradient matrix individually changes their relative scale but is also possible:

```
optimizer = tf.train.AdamOptimizer(1e-3)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients = [
None if gradient is None else tf.clip_by_norm(gradient, 5.0)
for gradient in gradients]
optimize = optimizer.apply_gradients(zip(gradients, variables))
```

In TensorFlow 2, a tape computes the gradients, the optimizers come from Keras, and we don"t need to store the update op because it runs automatically without passing it to a session:

```
optimizer = tf.keras.optimizers.Adam(1e-3)
# ...
with tf.GradientTape() as tape:
loss = ...
variables = ...
gradients = tape.gradient(loss, variables)
gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
optimizer.apply_gradients(zip(gradients, variables))
```

**Update:** User cphyc has kindly created a Github repository for the code in this answer (see here), and bundled the code into a package which may be installed using `pip install matplotlib-label-lines`

.

Pretty Picture:

In `matplotlib`

it"s pretty easy to label contour plots (either automatically or by manually placing labels with mouse clicks). There does not (yet) appear to be any equivalent capability to label data series in this fashion! There may be some semantic reason for not including this feature which I am missing.

Regardless, I have written the following module which takes any allows for semi-automatic plot labelling. It requires only `numpy`

and a couple of functions from the standard `math`

library.

The default behaviour of the `labelLines`

function is to space the labels evenly along the `x`

axis (automatically placing at the correct `y`

-value of course). If you want you can just pass an array of the x co-ordinates of each of the labels. You can even tweak the location of one label (as shown in the bottom right plot) and space the rest evenly if you like.

In addition, the `label_lines`

function does not account for the lines which have not had a label assigned in the `plot`

command (or more accurately if the label contains `"_line"`

).

Keyword arguments passed to `labelLines`

or `labelLine`

are passed on to the `text`

function call (some keyword arguments are set if the calling code chooses not to specify).

- Annotation bounding boxes sometimes interfere undesirably with other curves. As shown by the
`1`

and`10`

annotations in the top left plot. I"m not even sure this can be avoided. - It would be nice to specify a
`y`

position instead sometimes. - It"s still an iterative process to get annotations in the right location
- It only works when the
`x`

-axis values are`float`

s

- By default, the
`labelLines`

function assumes that all data series span the range specified by the axis limits. Take a look at the blue curve in the top left plot of the pretty picture. If there were only data available for the`x`

range`0.5`

-`1`

then then we couldn"t possibly place a label at the desired location (which is a little less than`0.2`

). See this question for a particularly nasty example. Right now, the code does not intelligently identify this scenario and re-arrange the labels, however there is a reasonable workaround. The labelLines function takes the`xvals`

argument; a list of`x`

-values specified by the user instead of the default linear distribution across the width. So the user can decide which`x`

-values to use for the label placement of each data series.

Also, I believe this is the first answer to complete the *bonus* objective of aligning the labels with the curve they"re on. :)

label_lines.py:

```
from math import atan2,degrees
import numpy as np
#Label line with line2D label data
def labelLine(line,x,label=None,align=True,**kwargs):
ax = line.axes
xdata = line.get_xdata()
ydata = line.get_ydata()
if (x < xdata[0]) or (x > xdata[-1]):
print("x label location is outside data range!")
return
#Find corresponding y co-ordinate and angle of the line
ip = 1
for i in range(len(xdata)):
if x < xdata[i]:
ip = i
break
y = ydata[ip-1] + (ydata[ip]-ydata[ip-1])*(x-xdata[ip-1])/(xdata[ip]-xdata[ip-1])
if not label:
label = line.get_label()
if align:
#Compute the slope
dx = xdata[ip] - xdata[ip-1]
dy = ydata[ip] - ydata[ip-1]
ang = degrees(atan2(dy,dx))
#Transform to screen co-ordinates
pt = np.array([x,y]).reshape((1,2))
trans_angle = ax.transData.transform_angles(np.array((ang,)),pt)[0]
else:
trans_angle = 0
#Set a bunch of keyword arguments
if "color" not in kwargs:
kwargs["color"] = line.get_color()
if ("horizontalalignment" not in kwargs) and ("ha" not in kwargs):
kwargs["ha"] = "center"
if ("verticalalignment" not in kwargs) and ("va" not in kwargs):
kwargs["va"] = "center"
if "backgroundcolor" not in kwargs:
kwargs["backgroundcolor"] = ax.get_facecolor()
if "clip_on" not in kwargs:
kwargs["clip_on"] = True
if "zorder" not in kwargs:
kwargs["zorder"] = 2.5
ax.text(x,y,label,rotation=trans_angle,**kwargs)
def labelLines(lines,align=True,xvals=None,**kwargs):
ax = lines[0].axes
labLines = []
labels = []
#Take only the lines which have labels other than the default ones
for line in lines:
label = line.get_label()
if "_line" not in label:
labLines.append(line)
labels.append(label)
if xvals is None:
xmin,xmax = ax.get_xlim()
xvals = np.linspace(xmin,xmax,len(labLines)+2)[1:-1]
for line,x,label in zip(labLines,xvals,labels):
labelLine(line,x,label,align,**kwargs)
```

Test code to generate the pretty picture above:

```
from matplotlib import pyplot as plt
from scipy.stats import loglaplace,chi2
from labellines import *
X = np.linspace(0,1,500)
A = [1,2,5,10,20]
funcs = [np.arctan,np.sin,loglaplace(4).pdf,chi2(5).pdf]
plt.subplot(221)
for a in A:
plt.plot(X,np.arctan(a*X),label=str(a))
labelLines(plt.gca().get_lines(),zorder=2.5)
plt.subplot(222)
for a in A:
plt.plot(X,np.sin(a*X),label=str(a))
labelLines(plt.gca().get_lines(),align=False,fontsize=14)
plt.subplot(223)
for a in A:
plt.plot(X,loglaplace(4).pdf(a*X),label=str(a))
xvals = [0.8,0.55,0.22,0.104,0.045]
labelLines(plt.gca().get_lines(),align=False,xvals=xvals,color="k")
plt.subplot(224)
for a in A:
plt.plot(X,chi2(5).pdf(a*X),label=str(a))
lines = plt.gca().get_lines()
l1=lines[-1]
labelLine(l1,0.6,label=r"$Re=${}".format(l1.get_label()),ha="left",va="bottom",align = False)
labelLines(lines[:-1],align=False)
plt.show()
```

To get this to work with jupyter (version 4.0.6) I created `~/.jupyter/custom/custom.css`

containing:

```
/* Make the notebook cells take almost all available width */
.container {
width: 99% !important;
}
/* Prevent the edit cell highlight box from getting clipped;
* important so that it also works when cell is in edit mode*/
div.cell.selected {
border-left-width: 1px !important;
}
```

Alternatively, in plain text: (also available as a a screenshot)

```
Bracket Matching -. .- Line Numbering
Smart Indent -. | | .- UML Editing / Viewing
Source Control Integration -. | | | | .- Code Folding
Error Markup -. | | | | | | .- Code Templates
Integrated Python Debugging -. | | | | | | | | .- Unit Testing
Multi-Language Support -. | | | | | | | | | | .- GUI Designer (Qt, Eric, etc)
Auto Code Completion -. | | | | | | | | | | | | .- Integrated DB Support
Commercial/Free -. | | | | | | | | | | | | | | .- Refactoring
Cross Platform -. | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
Atom |Y |F |Y |Y*|Y |Y |Y |Y |Y |Y | |Y |Y | | | | |*many plugins
Editra |Y |F |Y |Y | | |Y |Y |Y |Y | |Y | | | | | |
Emacs |Y |F |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y | | | |
Eric Ide |Y |F |Y | |Y |Y | |Y | |Y | |Y | |Y | | | |
Geany |Y |F |Y*|Y | | | |Y |Y |Y | |Y | | | | | |*very limited
Gedit |Y |F |Y¬π|Y | | | |Y |Y |Y | | |Y¬≤| | | | |¬πwith plugin; ¬≤sort of
Idle |Y |F |Y | |Y | | |Y |Y | | | | | | | | |
IntelliJ |Y |CF|Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |
JEdit |Y |F | |Y | | | | |Y |Y | |Y | | | | | |
KDevelop |Y |F |Y*|Y | | |Y |Y |Y |Y | |Y | | | | | |*no type inference
Komodo |Y |CF|Y |Y |Y |Y |Y |Y |Y |Y | |Y |Y |Y | |Y | |
NetBeans* |Y |F |Y |Y |Y | |Y |Y |Y |Y |Y |Y |Y |Y | | |Y |*pre-v7.0
Notepad++ |W |F |Y |Y | |Y*|Y*|Y*|Y |Y | |Y |Y*| | | | |*with plugin
Pfaide |W |C |Y |Y | | | |Y |Y |Y | |Y |Y | | | | |
PIDA |LW|F |Y |Y | | | |Y |Y |Y | |Y | | | | | |VIM based
PTVS |W |F |Y |Y |Y |Y |Y |Y |Y |Y | |Y | | |Y*| |Y |*WPF bsed
PyCharm |Y |CF|Y |Y*|Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |*JavaScript
PyDev (Eclipse) |Y |F |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y | | | |
PyScripter |W |F |Y | |Y |Y | |Y |Y |Y | |Y |Y |Y | | | |
PythonWin |W |F |Y | |Y | | |Y |Y | | |Y | | | | | |
SciTE |Y |F¬π| |Y | |Y | |Y |Y |Y | |Y |Y | | | | |¬πMac version is
ScriptDev |W |C |Y |Y |Y |Y | |Y |Y |Y | |Y |Y | | | | | commercial
Spyder |Y |F |Y | |Y |Y | |Y |Y |Y | | | | | | | |
Sublime Text |Y |CF|Y |Y | |Y |Y |Y |Y |Y | |Y |Y |Y*| | | |extensible w/Python,
TextMate |M |F | |Y | | |Y |Y |Y |Y | |Y |Y | | | | | *PythonTestRunner
UliPad |Y |F |Y |Y |Y | | |Y |Y | | | |Y |Y | | | |
Vim |Y |F |Y |Y |Y |Y |Y |Y |Y |Y | |Y |Y |Y | | | |
Visual Studio |W |CF|Y |Y |Y |Y |Y |Y |Y |Y |? |Y |? |? |Y |? |Y |
Visual Studio Code|Y |F |Y |Y |Y |Y |Y |Y |Y |Y |? |Y |? |? |? |? |Y |uses plugins
WingIde |Y |C |Y |Y*|Y |Y |Y |Y |Y |Y | |Y |Y |Y | | | |*support for C
Zeus |W |C | | | | |Y |Y |Y |Y | |Y |Y | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
Cross Platform -" | | | | | | | | | | | | | | | |
Commercial/Free -" | | | | | | | | | | | | | | "- Refactoring
Auto Code Completion -" | | | | | | | | | | | | "- Integrated DB Support
Multi-Language Support -" | | | | | | | | | | "- GUI Designer (Qt, Eric, etc)
Integrated Python Debugging -" | | | | | | | | "- Unit Testing
Error Markup -" | | | | | | "- Code Templates
Source Control Integration -" | | | | "- Code Folding
Smart Indent -" | | "- UML Editing / Viewing
Bracket Matching -" "- Line Numbering
```

Acronyms used:

```
L - Linux
W - Windows
M - Mac
C - Commercial
F - Free
CF - Commercial with Free limited edition
? - To be confirmed
```

I don"t mention basics like syntax highlighting as I expect these by default.

This is a just dry list reflecting your feedback and comments, I am not advocating any of these tools. I will keep updating this list as you keep posting your answers.

*PS. Can you help me to add features of the above editors to the list (like auto-complete, debugging, etc.)?*

We have a comprehensive wiki page for this question https://wiki.python.org/moin/IntegratedDevelopmentEnvironments

Submit edits to the spreadsheet

*Note: The ideas here are pretty generic for Stack Overflow, indeed questions.*

do include small* example DataFrame, either as runnable code:

`In [1]: df = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=["A", "B"])`

or make it "copy and pasteable" using

`pd.read_clipboard(sep="ss+")`

, you can format the text for Stack Overflow highlight and use`Ctrl`+`K`(or prepend four spaces to each line), or place three tildes above and below your code with your code unindented:`In [2]: df Out[2]: A B 0 1 2 1 1 3 2 4 6`

test

`pd.read_clipboard(sep="ss+")`

yourself.*

*I really do mean***small**, the vast majority of example DataFrames could be fewer than 6 rows^{citation needed}, and**I bet I can do it in 5 rows.**Can you reproduce the error with`df = df.head()`

, if not fiddle around to see if you can make up a small DataFrame which exhibits the issue you are facing.*

*Every rule has an exception, the obvious one is for performance issues (in which case definitely use %timeit and possibly %prun), where you should generate (consider using np.random.seed so we have the exact same frame):*`df = pd.DataFrame(np.random.randn(100000000, 10))`

. Saying that, "make this code fast for me" is not strictly on topic for the site...write out the outcome you desire (similarly to above)

`In [3]: iwantthis Out[3]: A B 0 1 5 1 4 6`

*Explain what the numbers come from: the 5 is sum of the B column for the rows where A is 1.*do show

*the code*you"ve tried:`In [4]: df.groupby("A").sum() Out[4]: B A 1 5 4 6`

*But say what"s incorrect: the A column is in the index rather than a column.*do show you"ve done some research (search the documentation, search Stack¬†Overflow), and give a summary:

The docstring for sum simply states "Compute sum of group values"

The groupby documentation doesn"t give any examples for this.

*Aside: the answer here is to use*`df.groupby("A", as_index=False).sum()`

.if it"s relevant that you have Timestamp columns, e.g. you"re resampling or something, then be explicit and apply

`pd.to_datetime`

to them for good measure**.`df["date"] = pd.to_datetime(df["date"]) # this column ought to be date..`

**

*Sometimes this is the issue itself: they were strings.*

don"t include a MultiIndex, which

**we can"t copy and paste**(see above). This is kind of a grievance with Pandas" default display, but nonetheless annoying:`In [11]: df Out[11]: C A B 1 2 3 2 6`

*The correct way is to include an ordinary DataFrame with a*`set_index`

call:`In [12]: df = pd.DataFrame([[1, 2, 3], [1, 2, 6]], columns=["A", "B", "C"]).set_index(["A", "B"]) In [13]: df Out[13]: C A B 1 2 3 2 6`

do provide insight to what it is when giving the outcome you want:

`B A 1 1 5 0`

*Be specific about how you got the numbers (what are they)... double check they"re correct.*If your code throws an error, do include the entire stack trace (this can be edited out later if it"s too noisy). Show the line number (and the corresponding line of your code which it"s raising against).

don"t link to a CSV file we don"t have access to (ideally don"t link to an external source at all...)

`df = pd.read_csv("my_secret_file.csv") # ideally with lots of parsing options`

**Most data is proprietary**we get that: Make up similar data and see if you can reproduce the problem (something small).don"t explain the situation vaguely in words, like you have a DataFrame which is "large", mention some of the column names in passing (be sure not to mention their dtypes). Try and go into lots of detail about something which is completely meaningless without seeing the actual context. Presumably no one is even going to read to the end of this paragraph.

*Essays are bad, it"s easier with small examples.*don"t include 10+ (100+??) lines of data munging before getting to your actual question.

*Please, we see enough of this in our day jobs. We want to help, but not like this....**Cut the intro, and just show the relevant DataFrames (or small versions of them) in the step which is causing you trouble.*

Actually, `pywin32`

and `ctypes`

seem to be an overkill for this simple task. `Tkinter`

is a cross-platform GUI framework, which ships with Python by default and has clipboard accessing methods along with other cool stuff.

If all you need is to put some text to system clipboard, this will do it:

```
from Tkinter import Tk
r = Tk()
r.withdraw()
r.clipboard_clear()
r.clipboard_append("i can has clipboardz?")
r.update() # now it stays on the clipboard after the window is closed
r.destroy()
```

And that"s all, no need to mess around with platform-specific third-party libraries.

If you are using Python 3, replace `TKinter`

with `tkinter`

.

This book introduces machine learning methods in finance. It features a unified treatment of machine learn...

12/08/2021

It’s in all of us. Data science is what makes us humans what we are today. No, not the computer-driven da...

10/07/2020

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems PDF, 2nd Edition. This book assumes you know next to nothing about m...

22/08/2021

Vincent Bumgarner has been designing software for nearly 20 years, working in many languages on nearly as many platforms. He started using Splunk in 2007 and has enjoyed watching the product evolve ov...

10/07/2020

X
# Submit new EBook