Orange 3 python руководство

Визуализация данных

Система Orange является инструментом для визуализации и анализа данных с
открытым исходным кодом. Интеллектуальный анализ данных проводится путем
визуального программирования и с помощью Python сценариев.

На рисунке представлен скриншот главного окна программы Orange3.

Визуализация данных

Рабочее пространство состоит из виджетов и связей между ними.

Каждый виджет имеет свой тип. Тип виджета можно определить по его
иконке.

Виджеты сгруппированы по разделам: Data, Visualization, Predictions и
пр. Группа виджета определяет цвет иконки.

Каждый виджет имеет множество (возможно, пустое) входных и множество
выходных сигналов. Сигнал определяет данные, которые поступают на вход
виджету или являются его результатом. При получении входного сигнала
виджет выполняет определенные действия и оповещает связанные с ним
виджеты путем отправки им соответсвующих сигналов.

Сигнал представляет собой экземпляр класса-наследника
Orange.util.Reprable.

Для загрузки датасета имеется множество виджетов. Самый простой (File)
считывает данные из файла или загружает по URL. Существуют виджеты для
получения данных из базы данных PostgreSQL, Google Docs и других
источников.

Скриншот параметров виджета File представлен на рисунке. Виджет
позволяет выбрать файл с жесткого диска или загрузить из интернета по
URL, а также выводит основные параметры датасета.

Параметры виджета File

Виджет File имеет единственный выходной сигнал Data (тип
Orange.data.Table). Он связан с единственным входным сигналом Data
виджета Data Table.

Виджет Data Table выводит данные из файла на экран.

Виджет Data Table

При обновлении файла обновляется Data Table. При создании связи между
виджетами входной и выходной сигналы выбираются автоматически. Если
сигналов виджета много, то могут возникнуть ошибки. Для редактирования
связи необходимо дважды кликнуть по ней мышью (рисунок [view1a]).

Связь виджетов File и Data Table

Связь между двумя виджетами подписывается над стрелкой. Если названия
входного и выходного сигнала совпадают, то указывается это название.
Если не совпадают, то указываются оба сигнала.

Виджет Scatter Plot позволяет строить двумерные графики по выбранным
признакам.

Виджет Scatter Plot

Виджет Scatter Plot имеет три входных сигнала:

  • Data (Orange.data.Table);

  • Data Subset (Orange.data.Table);

  • Features (Orange.widgets.widget.AttributeList).

Сигнал Data принимает данные для отображения на графике, а сигнал Data
Subset – подмножество данных. Если Data Subset определен, то на графике
будут заштрихованы точки, соответствующие Data Subset.

Так, можно выбрать некоторые элементы из таблицы данных Data Table, и
увидеть, как они расположены на графике по отношению к другим точкам. В
примере на рисунке выбраны образы 3, 4, 5, 6.

Можно выделить некоторые точки на графике и изучить значения признаков
соответствующих им объектов в таблице.

Система Orange содержит большое количество виджетов для визуализации
данных, не рассмотренных выше. Среди них:

  • Box Plot для построения диаграммы размаха (<<ящик с усами>>);

  • Distributions для построения диаграммы частотного распределения
    признака;

  • Heat Map для построения тепловой диаграммы;

  • Venn Diagram для построения диаграммы Венна;

  • Sieve Diagram для построения паркетной диаграммы Ридвиля и Шюпбаха;

  • Pythagorean Tree и Pythagorean Forest для построения деревьев
    Пифагора;

  • Mosaic Display для построения мозаичной диаграммы;

  • Tree Viewer для визуального представления древовидных структур;

  • FreeViz и Radviz для визуализации многомерных данных;

и другие.

Тестирование классификаторов

Схема программы для тестирования классификаторов приведена на рисунке.

Программа тестирования классификаторов

На форме размещены следующие виджеты:

  • File для чтения датасета из файла;

  • Python-скрипты MyNaiveBayes и MyMPF, классификаторы Байеса и методом
    потенциальных функций;

  • Test and Score, виджет для сравнения и оценки классификаторов;

  • Confusion Matrix, Scatter Plot, Classification results виджет для
    вывода результатов классификации;

MyNaiveBayes и MyMPF подробно рассмотрены в следующем разделе.

Виджет Test and Score принимает следующие входные сигналы @tas:

  • Data (Orange.data.Table) – данные, на которых будет обучена
    модель;

  • Test Data (Orange.data.Table) – данные для проверки модели;

  • Learner (Orange.classification) – один или несколько
    <<учеников>> – обученных моделей классификации, которые будут
    тестироваться.

В качестве результата виджет имеет следующие выходные сигналы:

  • Evaluation results (Orange.evaluation.Results) – результаты
    проверки классификаторов;

  • Predictions (Orange.data.Table) – размеченная тестовая выборка.

Виджет Test and Score позволяет тестировать классификаторы одним из
следующих методов:

  • кросс-валидация;

  • выделение одного;

  • случайная разбивка в заданном соотношении;

  • проверка на обучающей выборке;

  • проверка на тестовой выборке.

Проверка классификатора осуществлялась методом случайного деления
обучающей выборки в соотношении 33:66, т.е. треть данных была выделена
для тестирования. Операция повторится два раза без сохранения
промежуточной информации.

Результаты классификации приведены на рисунке.

Виджет Test and Score

Были вычислены численные характеристики классификации:

  • Area under ROC;

  • Classification accuracy;

  • F-1;

  • Precision;

  • Recall.

Для более детального анализа воспользуемся виджетом Confusion Matrix для
сравнения количества правильно и неправильно распознанных образов.
Результаты классификации MyMPF и MyNaiveBayes приведены на рисунках.

Результаты классификаци MyMPF

Результаты классификаци MyNaiveBayes

Образы, которые были подвергнуты классификации приведены на рисунках.

Сравнение результатов классификаци

Сравнение результатов классификаци

Результаты

Файлы orange-проектов можно скачать по ссылкам:

ViewData.ows

Recognition.ows


Программирование

Assembler,
awk,
Bash,
Django,
HTML,
Java,
JavaScript,
Python,
PHP,
regexp,
sed,
south.

Базы данных

SQL,
MySQL,
SQLile,

Серверы

Apache2,
BGBilling,
Debian,
GNU/Linux,
Zabbix.

Сети

Все,
iptables.

Документация

LaTeX.

Образование

Все,
РГАТУ.

Прочее

CrackMe,
CryptoGuard,
DB,
DC,
EPG,
ffmpeg,
HowTo,
IPTV,
MediaWiki,
motion,
Tools,
TV,
Видеонаблюдение,
VLC,
wget,
XML.

Orange Data Mining

Orange Data Mining

Orange is a data mining and visualization toolbox for novice and expert alike. To explore data with Orange, one requires no programming or in-depth mathematical knowledge. We believe that workflow-based data science tools democratize data science by hiding complex underlying mechanics and exposing intuitive concepts. Anyone who owns data, or is motivated to peek into data, should have the means to do so.

Example Workflow

Installing

Easy installation

For easy installation, Download the latest released Orange version from our website. To install an add-on, head to Options -> Add-ons... in the menu bar.

Installing with Conda

First, install Miniconda for your OS.

Then, create a new conda environment, and install orange3:

# Add conda-forge to your channels for access to the latest release
conda config --add channels conda-forge

# Perhaps enforce strict conda-forge priority
conda config --set channel_priority strict

# Create and activate an environment for Orange
conda create python=3 --yes --name orange3
conda activate orange3

# Install Orange
conda install orange3

For installation of an add-on, use:

conda install orange3-<addon name>

See specific add-on repositories for details.

Installing with pip

We recommend using our standalone installer or conda, but Orange is also installable with pip. You will need a C/C++ compiler (on Windows we suggest using Microsoft Visual Studio Build Tools).
Orange needs PyQt to run. Install either:

  • PyQt5 and PyQtWebEngine: pip install -r requirements-pyqt.txt
  • PyQt6 and PyQt6-WebEngine: pip install PyQt6 PyQt6-WebEngine

Installing with winget (Windows only)

To install Orange with winget, run:

winget install --id  UniversityofLjubljana.Orange 

Running

Ensure you’ve activated the correct virtual environment. If following the above conda instructions:

Run orange-canvas or python3 -m Orange.canvas. Add --help for a list of program options.

Starting up for the first time may take a while.

Developing

GitHub Actions codecov Contributor count Latest GitHub commit

Want to write a widget? Use the Orange3 example add-on template.

Want to get involved? Join us on Discord, introduce yourself in #general!

Take a look at our contributing guide and style guidelines.

Check out our widget development docs for a comprehensive guide on writing Orange widgets.

The Orange ecosystem

The development of core Orange is primarily split into three repositories:

biolab/orange-canvas-core implements the canvas,
biolab/orange-widget-base is a handy widget GUI library,
biolab/orange3 brings it all together and implements the base data mining toolbox.

Additionally, add-ons implement additional widgets for more specific use cases. Anyone can write an add-on. Some of our first-party add-ons:

  • biolab/orange3-text
  • biolab/orange3-bioinformatics
  • biolab/orange3-timeseries
  • biolab/orange3-single-cell
  • biolab/orange3-imageanalytics
  • biolab/orange3-educational
  • biolab/orange3-geo
  • biolab/orange3-associate
  • biolab/orange3-network
  • biolab/orange3-explain

Setting up for core Orange development

First, fork the repository by pressing the fork button in the top-right corner of this page.

Set your GitHub username,

export MY_GITHUB_USERNAME=replaceme

create a conda environment, clone your fork, and install it:

conda create python=3 --yes --name orange3
conda activate orange3

git clone ssh://git@github.com/$MY_GITHUB_USERNAME/orange3

# Install PyQT and PyQtWebEngine. You can also use PyQt6
pip install -r requirements-pyqt.txt
pip install -e orange3

Now you’re ready to work with git. See GitHub’s guides on pull requests, forks if you’re unfamiliar. If you’re having trouble, get in touch on Discord.

Running

Run Orange with python -m Orange.canvas (after activating the conda environment).

python -m Orange.canvas -l 2 --no-splash --no-welcome will skip the splash screen and welcome window, and output more debug info. Use -l 4 for more.

Add --clear-widget-settings to clear the widget settings before start.

To explore the dark side of the Orange, try --style=fusion:breeze-dark

Argument --help lists all available options.

To run tests, use unittest Orange.tests Orange.widgets.tests

Setting up for development of all components

Should you wish to contribute Orange’s base components (the widget base and the canvas), you must also clone these two repositories from Github instead of installing them as dependencies of Orange3.

First, fork all the repositories to which you want to contribute.

Set your GitHub username,

export MY_GITHUB_USERNAME=replaceme

create a conda environment, clone your forks, and install them:

conda create python=3 --yes --name orange3
conda activate orange3

# Install PyQT and PyQtWebEngine. You can also use PyQt6
pip install -r requirements-pyqt.txt

git clone ssh://git@github.com/$MY_GITHUB_USERNAME/orange-widget-base
pip install -e orange-widget-base

git clone ssh://git@github.com/$MY_GITHUB_USERNAME/orange-canvas-core
pip install -e orange-canvas-core

git clone ssh://git@github.com/$MY_GITHUB_USERNAME/orange3
pip install -e orange3

# Repeat for any add-on repositories

It’s crucial to install orange-base-widget and orange-canvas-core before orange3 to ensure that orange3 will use your local versions.

This section describes how to load the data in Orange. We also show how to explore the data, perform some basic statistics, and how to sample the data.

Data Input¶

Orange can read files in proprietary tab-delimited format, or can load data from any of the major standard spreadsheet file types, like CSV and Excel. Native format starts with a header row with feature (column) names. The second header row gives the attribute type, which can be numeric, categorical, time, or string. The third header line contains meta information to identify dependent features (class), irrelevant features (ignore) or meta features (meta).
More detailed specification is available in Loading and saving data (io).
Here are the first few lines from a dataset lenses.tab:

age       prescription  astigmatic    tear_rate     lenses
discrete  discrete      discrete      discrete      discrete
                                                    class
young     myope         no            reduced       none
young     myope         no            normal        soft
young     myope         yes           reduced       none
young     myope         yes           normal        hard
young     hypermetrope  no            reduced       none

Values are tab-limited. This dataset has four attributes (age of the patient, spectacle prescription, notion on astigmatism, and information on tear production rate) and an associated three-valued dependent variable encoding lens prescription for the patient (hard contact lenses, soft contact lenses, no lenses). Feature descriptions could use one letter only, so the header of this dataset could also read:

age       prescription  astigmatic    tear_rate     lenses
d         d             d             d             d
                                                    c

The rest of the table gives the data. Note that there are 5 instances in our table above. For the full dataset, check out or download lenses.tab) to a target directory. You can also skip this step as Orange comes preloaded with several demo datasets, lenses being one of them. Now, open a python shell, import Orange and load the data:

>>> import Orange
>>> data = Orange.data.Table("lenses")
>>>

Note that for the file name no suffix is needed, as Orange checks if any files in the current directory are of a readable type. The call to Orange.data.Table creates an object called data that holds your dataset and information about the lenses domain:

>>> data.domain.attributes
(DiscreteVariable('age', values=('pre-presbyopic', 'presbyopic', 'young')),
 DiscreteVariable('prescription', values=('hypermetrope', 'myope')),
 DiscreteVariable('astigmatic', values=('no', 'yes')),
 DiscreteVariable('tear_rate', values=('normal', 'reduced')))
>>> data.domain.class_var
DiscreteVariable('lenses', values=('hard', 'none', 'soft'))
>>> for d in data[:3]:
   ...:     print(d)
   ...:
[young, myope, no, reduced | none]
[young, myope, no, normal | soft]
[young, myope, yes, reduced | none]
>>>

The following script wraps-up everything we have done so far and lists first 5 data instances with soft prescription:

import Orange

data = Orange.data.Table("lenses")
print("Attributes:", ", ".join(x.name for x in data.domain.attributes))
print("Class:", data.domain.class_var.name)
print("Data instances", len(data))

target = "soft"
print("Data instances with %s prescriptions:" % target)
atts = data.domain.attributes
for d in data:
    if d.get_class() == target:
        print(" ".join(["%14s" % str(d[a]) for a in atts]))

Note that data is an object that holds both the data and information on the domain. We show above how to access attribute and class names, but there is much more information there, including that on feature type, set of values for categorical features, and other.

Creating a Data Table¶

To create a data table from scratch, one needs two things, a domain and the data. The domain is the description of the variables, i.e. column names, types, roles, etc.

First, we create the said domain. We will create three types of variables, numeric (ContiniousVariable), categorical (DiscreteVariable) and text (StringVariable). Numeric and categorical variables will be used a features (also known as X), while the text variable will be used as a meta variable.

>>> from Orange.data import Domain, ContinuousVariable,
    DiscreteVariable, StringVariable
>>>
>>> domain = Domain([ContinuousVariable("col1"),
                     DiscreteVariable("col2", values=["red", "blue"])],
                     metas=[StringVariable("col3")])

Now, we will build the data with numpy.

>>> import numpy as np
>>>
>>> column1 = np.array([1.2, 1.4, 1.5, 1.1, 1.2])
>>> column2 = np.array([0, 1, 1, 1, 0])
>>> column3 = np.array(["U13", "U14", "U15", "U16", "U17"], dtype=object)

Two things to note here. column2 has values 0 and 1, even though we specified it will be a categorical variable with values «red» and «blue». X (features in the data) can only be numbers, so the numpy matrix will contain numbers, while Orange will handle the categorical representation internally. 0 will be mapped to the value «red» and 1 to «blue» (in the order, specified in the domain).

Text variable requires dtype=object for numpy to handle it correctly.

Next, variables have to be transformed to a matrix.

>>> X = np.column_stack((column1, column2))
>>> M = column3.reshape(-1, 1)

Finally, we create a table. We need a domain and variables, which can be passed as X (features), Y (class variable) or metas.

>>> table = Table.from_numpy(domain, X=X, metas=M)
>>> print(table)
>>> [[1.2, red] {U13},
[1.4, blue] {U14},
[1.5, blue] {U15},
[1.1, blue] {U16},
[1.2, red] {U17}]

To add a class variable to the table, the procedure would be the same, with the class variable passed as Y (e.g. table = Table.from_numpy(domain, X=X, Y=Y, metas=M)).

To add a single column to the table, one can use the Table.add_column() method.

>>> new_var = DiscreteVariable("var4", values=["one", "two"])
>>> var4 = np.array([0, 1, 0, 0, 1]) # no reshaping necessary
>>> table = table.add_column(new_var, var4)
>>> print(table)
>>> [[1.2, red, one] {U13},
[1.4, blue, two] {U14},
[1.5, blue, one] {U15},
[1.1, blue, one] {U16},
[1.2, red, two] {U17}]

Saving the Data¶

Data objects can be saved to a file:

>>> data.save("new_data.tab")
>>>

This time, we have to provide the file extension to specify the output format. An extension for native Orange’s data format is «.tab». The following code saves only the data items with myope perscription:

import Orange

data = Orange.data.Table("lenses")
myope_subset = [d for d in data if d["prescription"] == "myope"]
new_data = Orange.data.Table(data.domain, myope_subset)
new_data.save("lenses-subset.tab")

We have created a new data table by passing the information on the structure of the data (data.domain) and a subset of data instances.

Exploration of the Data Domain¶

Data table stores information on data instances as well as on data domain. Domain holds the names of attributes, optional classes, their types and, and if categorical, the value names. The following code:

import Orange

data = Orange.data.Table("imports-85.tab")
n = len(data.domain.attributes)
n_cont = sum(1 for a in data.domain.attributes if a.is_continuous)
n_disc = sum(1 for a in data.domain.attributes if a.is_discrete)
print("%d attributes: %d continuous, %d discrete" % (n, n_cont, n_disc))

print(
    "First three attributes:",
    ", ".join(data.domain.attributes[i].name for i in range(3)),
)

print("Class:", data.domain.class_var.name)

outputs:

25 attributes: 14 continuous, 11 discrete
First three attributes: symboling, normalized-losses, make
Class: price

Orange’s objects often behave like Python lists and dictionaries, and can be indexed or accessed through feature names:

print("First attribute:", data.domain[0].name)
name = "fuel-type"
print("Values of attribute '%s': %s" % (name, ", ".join(data.domain[name].values)))

The output of the above code is:

First attribute: symboling
Values of attribute 'fuel-type': diesel, gas

Data Instances¶

Data table stores data instances (or examples). These can be indexed or traversed as any Python list. Data instances can be considered as vectors, accessed through element index, or through feature name.

import Orange

data = Orange.data.Table("iris")
print("First three data instances:")
for d in data[:3]:
    print(d)

print("25-th data instance:")
print(data[24])

name = "sepal width"
print("Value of '%s' for the first instance:" % name, data[0][name])
print("The 3rd value of the 25th data instance:", data[24][2])

The script above displays the following output:

First three data instances:
[5.100, 3.500, 1.400, 0.200 | Iris-setosa]
[4.900, 3.000, 1.400, 0.200 | Iris-setosa]
[4.700, 3.200, 1.300, 0.200 | Iris-setosa]
25-th data instance:
[4.800, 3.400, 1.900, 0.200 | Iris-setosa]
Value of 'sepal width' for the first instance: 3.500
The 3rd value of the 25th data instance: 1.900

The Iris dataset we have used above has four continuous attributes. Here’s a script that computes their mean:

average = lambda x: sum(x) / len(x)

data = Orange.data.Table("iris")
print("%-15s %s" % ("Feature", "Mean"))
for x in data.domain.attributes:
    print("%-15s %.2f" % (x.name, average([d[x] for d in data])))

The above script also illustrates indexing of data instances with objects that store features; in d[x] variable x is an Orange object. Here’s the output:

Feature         Mean
sepal length    5.84
sepal width     3.05
petal length    3.76
petal width     1.20

A slightly more complicated, but also more interesting, code that computes per-class averages:

average = lambda xs: sum(xs) / float(len(xs))

data = Orange.data.Table("iris")
targets = data.domain.class_var.values
print("%-15s %s" % ("Feature", " ".join("%15s" % c for c in targets)))
for a in data.domain.attributes:
    dist = [
        "%15.2f" % average([d[a] for d in data if d.get_class() == c]) for c in targets
    ]
    print("%-15s" % a.name, " ".join(dist))

Of the four features, petal width and length look quite discriminative for the type of iris:

Feature             Iris-setosa Iris-versicolor  Iris-virginica
sepal length               5.01            5.94            6.59
sepal width                3.42            2.77            2.97
petal length               1.46            4.26            5.55
petal width                0.24            1.33            2.03

Finally, here is a quick code that computes the class distribution for another dataset:

import Orange
from collections import Counter

data = Orange.data.Table("lenses")
print(Counter(str(d.get_class()) for d in data))

Orange Datasets and NumPy¶

Orange datasets are actually wrapped NumPy arrays. Wrapping is performed to retain the information about the feature names and values, and NumPy arrays are used for speed and compatibility with different machine learning toolboxes, like scikit-learn, on which Orange relies. Let us display the values of these arrays for the first three data instances of the iris dataset:

>>> data = Orange.data.Table("iris")
>>> data.X[:3]
array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2]])
>>> data.Y[:3]
array([ 0.,  0.,  0.])

Notice that we access the arrays for attributes and class separately, using data.X and data.Y. Average values of attributes can then be computed efficiently by:

>>> import np as numpy
>>> np.mean(data.X, axis=0)
array([ 5.84333333,  3.054     ,  3.75866667,  1.19866667])

We can also construct a (classless) dataset from a numpy array:

>>> X = np.array([[1,2], [4,5]])
>>> data = Orange.data.Table(X)
>>> data.domain
[Feature 1, Feature 2]

If we want to provide meaninful names to attributes, we need to construct an appropriate data domain:

>>> domain = Orange.data.Domain([Orange.data.ContinuousVariable("lenght"),
                                 Orange.data.ContinuousVariable("width")])
>>> data = Orange.data.Table(domain, X)
>>> data.domain
[lenght, width]

Here is another example, this time with the construction of a dataset that includes a numerical class and different types of attributes:

size = Orange.data.DiscreteVariable("size", ["small", "big"])
height = Orange.data.ContinuousVariable("height")
shape = Orange.data.DiscreteVariable("shape", ["circle", "square", "oval"])
speed = Orange.data.ContinuousVariable("speed")

domain = Orange.data.Domain([size, height, shape], speed)

X = np.array([[1, 3.4, 0], [0, 2.7, 2], [1, 1.4, 1]])
Y = np.array([42.0, 52.2, 13.4])

data = Orange.data.Table(domain, X, Y)
print(data)

Running of this scripts yields:

[[big, 3.400, circle | 42.000],
 [small, 2.700, oval | 52.200],
 [big, 1.400, square | 13.400]

Missing Values¶

Consider the following exploration of the dataset on votes of the US senate:

>>> import numpy as np
>>> data = Orange.data.Table("voting.tab")
>>> data[2]
[?, y, y, ?, y, ... | democrat]
>>> np.isnan(data[2][0])
True
>>> np.isnan(data[2][1])
False

The particular data instance included missing data (represented with ‘?’) for the first and the fourth attribute. In the original dataset file, the missing values are, by default, represented with a blank space. We can now examine each attribute and report on proportion of data instances for which this feature was undefined:

data = Orange.data.Table("voting.tab")
for x in data.domain.attributes:
    n_miss = sum(1 for d in data if np.isnan(d[x]))
    print("%4.1f%% %s" % (100.0 * n_miss / len(data), x.name))

First three lines of the output of this script are:

 2.8% handicapped-infants
11.0% water-project-cost-sharing
 2.5% adoption-of-the-budget-resolution

A single-liner that reports on number of data instances with at least one missing value is:

>>> sum(any(np.isnan(d[x]) for x in data.domain.attributes) for d in data)
203

Data Selection and Sampling¶

Besides the name of the data file, Orange.data.Table can accept the data domain and a list of data items and returns a new dataset. This is useful for any data subsetting:

data = Orange.data.Table("iris.tab")
print("Dataset instances:", len(data))
subset = Orange.data.Table(data.domain, [d for d in data if d["petal length"] > 3.0])
print("Subset size:", len(subset))

The code outputs:

Dataset instances: 150
Subset size: 99

and inherits the data description (domain) from the original dataset. Changing the domain requires setting up a new domain descriptor. This feature is useful for any kind of feature selection:

data = Orange.data.Table("iris.tab")
new_domain = Orange.data.Domain(
    list(data.domain.attributes[:2]),
    data.domain.class_var
)
new_data = Orange.data.Table(new_domain, data)

print(data[0])
print(new_data[0])

We could also construct a random sample of the dataset:

>>> sample = Orange.data.Table(data.domain, random.sample(data, 3))
>>> sample
[[6.000, 2.200, 4.000, 1.000 | Iris-versicolor],
 [4.800, 3.100, 1.600, 0.200 | Iris-setosa],
 [6.300, 3.400, 5.600, 2.400 | Iris-virginica]
]

or randomly sample the attributes:

>>> atts = random.sample(data.domain.attributes, 2)
>>> domain = Orange.data.Domain(atts, data.domain.class_var)
>>> new_data = Orange.data.Table(domain, data)
>>> new_data[0]
[5.100, 1.400 | Iris-setosa]

Building Machine Learning Model Using Orange Is Fun

Introduction:Data has become a powerful source of earning and predict future and people will seek to utilize it even if they don’t know exactly how. Machine learning will become a usual part of programmer’s resume, data scientists will be as common as accountants. Nowadays and for the next approximately two decades, we will continue to see a major need for machine learning and data science specialists to help apply machine learning technologies to application areas where they aren’t applied today.Nowadays people prefer GUI based tools instead of more coding stuff. Orange is one of the popular open source machine learning and data visualization tool for beginners. People who don’t know more about coding and willing to visualize pattern and other stuff can easily work with Orange.Why Orange:Orange is an open-source software package released under GPL that powers Python scripts with its rich compilation of mining and machine learning algorithms for data pre-processing, classification, modeling, regression, clustering and other miscellaneous functions.Orange also comes with a visual programming environment and its workbench consists of tools for importing data, dragging and dropping widgets, and links to connect different widgets for completing the workflow.Orange uses common Python open-source libraries for scientific computing, such as numpy, scipy, and scikit-learn, while its graphical user interface operates within the cross-platform Qt framework.Getting Started:Here is how to get started with data visualization in orange.Download Orange=> Go to https://orange.biolab.si and click on Download.For Linux Users:AnacondaIf you are using python provided by Anaconda distribution, you are almost ready to go. If not, follow these steps to download Orange:conda config —add channels conda-forgeand runconda install orange3 conda install -c defaults pyqt=5 qtPIPOrange can also be installed from the Python Package Index. You may need additional system packages provided by your distribution.pip install orange3Run shortcode to verify your setup Orange successfully. Open your Python Terminal and run the following code :>>> import Orange
>>> Orange.version.version
‘3.2.dev0+8907f35’
>>>Note: If You find result shown above then you successfully setup Orange. In case you get an error like this :from Orange.data import _variable
ImportError: cannot import name ‘_variable’Kindly Follow These Steps: Install Orange From SourceMac and Windows user can easily setup orange in their system step by step just by following the official docs of Orange:Official docs for setup OrangeAfter installation let’s start working with OrangeVisualization of DataSet in Orange:A primary goal of data visualization is to communicate information clearly and efficiently via statistical graphics, plots and information graphics.The Main goal of data visualization is to communicate information clearly and effectively through graphical means. It doesn’t mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex data set by communicating its key-aspects in a more intuitive way. Yet designers often fail to achieve a balance between form and function, creating gorgeous data visualizations which fail to serve their main purpose — to communicate information.” -Friedman (2008)Open Orange on your system & create your own new Workflow:After you clicked on “New” in the above step, this is what you should have come up with:In this tutorial, we are going to see the steps for Visualization of DataSet in orange:Step 1: Without data, there is no existence of Machine Learning. So, In our first step we import our dataset and in this tutorial, I use example dataset available in Orange Directory. We import zoo.tab dataset in file widget:Step 2: In next step, we need data tables to view our dataset. For this, we use Data Table widget.When we double-click on the data table widget we can visualize our data in actual format.Step 3: This is the last step where we will understand our data, with the help of visualization. Orange make visualization pretty much easier. We just add one more widget and choose which format we would like to visualize our data like Scatter Plot.This completely works on the concept of neurons, data transfer from one layer to another layer when we connect data table to scatter plot widget then we find an actual representation of our data in the form of scatter plot.Closing Note:Orange is the most powerful tool used for almost any kind of analysis and visualizing dataset is fun using Orange. The default installation includes a number of machine learning, preprocessing and data visualization algorithms in 6 widget sets (data, visualize, classify, regression, evaluate and unsupervised). Additional functionalities are available as add-ons (bioinformatics, data fusion and, text-mining).Hope, this tutorial helps you to understand how to visualize data set using orange. It is very important to understand the flow of data, this helps you to figure out problems easily.Keep Practicing with OrangeHappy Coding

Rated 4.0/5
based on 29 customer reviews

  1. Home
  2. Magazine
  3. Building Machine Learning Model Using Orange Is Fun

Building Machine Learning Model Using Orange Is Fun

By Himanshu Awasthi
19th Apr, 2018

  • Category:
    Blog



  • 71

Building Machine Learning Model Using Orange Is Fun

Introduction:

Data has become a powerful source of earning and predict future and people will seek to utilize it even if they don’t know exactly how. Machine learning will become a usual part of programmer’s resume, data scientists will be as common as accountants. Nowadays and for the next approximately two decades, we will continue to see a major need for machine learning and data science specialists to help apply machine learning technologies to application areas where they aren’t applied today.

Nowadays people prefer GUI based tools instead of more coding stuff. Orange is one of the popular open source machine learning and data visualization tool for beginners. People who don’t know more about coding and willing to visualize pattern and other stuff can easily work with Orange.

Why Orange:

Orange is an open-source software package released under GPL that powers Python scripts with its rich compilation of mining and machine learning algorithms for data pre-processing, classification, modeling, regression, clustering and other miscellaneous functions.

Orange also comes with a visual programming environment and its workbench consists of tools for importing data, dragging and dropping widgets, and links to connect different widgets for completing the workflow.

Orange uses common Python open-source libraries for scientific computing, such as numpy, scipy, and scikit-learn, while its graphical user interface operates within the cross-platform Qt framework.

Getting Started:

Here is how to get started with data visualization in orange.
Download Orange
=> Go to https://orange.biolab.si and click on Download.

Linux Users:
For Linux Users:

Anaconda
If you are using python provided by Anaconda distribution, you are almost ready to go. If not, follow these steps to download Orange:

conda config —add channels conda-forge

and run

conda install orange3 conda install -c defaults pyqt=5 qt

PIP
Orange can also be installed from the Python Package Index. You may need additional system packages provided by your distribution.

pip install orange3

Run shortcode to verify your setup Orange successfully. Open your Python Terminal and run the following code :

>>> import Orange
>>> Orange.version.version
'3.2.dev0+8907f35'
>>>

Note: If You find result shown above then you successfully setup Orange. In case you get an error like this :

from Orange.data import _variable
ImportError: cannot import name '_variable'

Visualization of DataSet in Orange:

A primary goal of data visualization is to communicate information clearly and efficiently via statistical graphics, plots and information graphics.

The Main goal of data visualization is to communicate information clearly and effectively through graphical means. It doesn’t mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex data set by communicating its key-aspects in a more intuitive way. Yet designers often fail to achieve a balance between form and function, creating gorgeous data visualizations which fail to serve their main purpose — to communicate information.” -Friedman (2008)

Open Orange on your system & create your own new Workflow:


After you clicked on “New” in the above step, this is what you should have come up with:

Visualization of DataSet in orange:

In this tutorial, we are going to see the steps for Visualization of DataSet in orange:

Step 1: Without data, there is no existence of Machine Learning. So, In our first step we import our dataset and in this tutorial, I use example dataset available in Orange Directory. We import zoo.tab dataset in file widget:

Step 2: In next step, we need data tables to view our dataset. For this, we use Data Table widget.

visualize our data in actual format.
When we double-click on the data table widget we can visualize our data in actual format.


Step 3: This is the last step where we will understand our data, with the help of visualization. Orange make visualization pretty much easier. We just add one more widget and choose which format we would like to visualize our data like Scatter Plot.


This completely works on the concept of neurons, data transfer from one layer to another layer when we connect data table to scatter plot widget then we find an actual representation of our data in the form of scatter plot.

Closing Note:

Orange is the most powerful tool used for almost any kind of analysis and visualizing dataset is fun using Orange. The default installation includes a number of machine learning, preprocessing and data visualization algorithms in 6 widget sets (data, visualize, classify, regression, evaluate and unsupervised). Additional functionalities are available as add-ons (bioinformatics, data fusion and, text-mining).

Hope, this tutorial helps you to understand how to visualize data set using orange. It is very important to understand the flow of data, this helps you to figure out problems easily.

Keep Practicing with Orange

Happy Coding

  • Big Data
  • Artificial Intelligence

REQUEST A FREE DEMO CLASS

SUBSCRIBE OUR BLOG

TRENDING BLOG POSTS

Python in a Nutshell: Everything That You Need to Know

By Susan May

Python is one of the best known high-level programming languages in the world, like Java. It’s steadily gaining traction among programmers because it’s easy to integrate with other technologies and offers more stability and higher coding productivity, especially when it comes to mass projects with volatile requirements. If you’re considering learning an object-oriented programming language, consider starting with Python.A Brief Background On Python It was first created in 1991 by Guido Van Rossum, who eventually wants Python to be as understandable and clear as English. It’s open source, so anyone can contribute to, and learn from it. Aside from supporting object-oriented programming and imperative and functional programming, it also made a strong case for readable code. Python is hence, a multi-paradigm high-level programming language that is also structure supportive and offers meta-programming and logic-programming as well as ‘magic methods’.More Features Of PythonReadability is a key factor in Python, limiting code blocks by using white space instead, for a clearer, less crowded appearancePython uses white space to communicate the beginning and end of blocks of code, as well as ‘duck typing’ or strong typingPrograms are small and run quickerPython requires less code to create a program but is slow in executionRelative to Java, it’s easier to read and understand. It’s also more user-friendly and has a more intuitive coding styleIt compiles native bytecodeWhat It’s Used For, And By WhomUnsurprisingly, Python is now one of the top five most popular programming languages in the world. It’s helping professionals solve an array of technical, as well as business problems. For example, every day in the USA, over 36,000 weather forecasts are issued in more than 800 regions and cities. These forecasts are put in a database, compared to actual conditions encountered location-wise, and the results are then tabulated to improve the forecast models, the next time around. The programming language allowing them to collect, analyze, and report this data? Python!40% of data scientists in a survey taken by industry analyst O’Reilly in 2013, reported using Python in their day-to-day workCompanies like Google, NASA, and CERN use Python for a gamut of programming purposes, including data scienceIt’s also used by Wikipedia, Google, and Yahoo!, among many othersYouTube, Instagram, Quora, and Dropbox are among the many apps we use every day, that use PythonPython has been used by digital special effects house ILM, who has worked on the Star Wars and Marvel filmsIt’s often used as a ‘scripting language’ for web apps and can automate a specific progression of tasks, making it more efficient. That’s why it is used in the development of software applications, web pages, operating systems shells, and games. It’s also used in scientific and mathematical computing, as well as AI projects, 3D modelers and animation packages.Is Python For You? Programming students find it relatively easy to pick up Python. It has an ever-expanding list of applications and is one of the hottest languages in the ICT world. Its functions can be executed with simpler commands and much less text than most other programming languages. That could explain its popularity amongst developers and coding students.If you’re a professional or a student who wants to pursue a career in programming, web or app development, then you will definitely benefit from a Python training course. It would help if you have prior knowledge of basic programming concepts and object-oriented concepts. To help you understand how to approach Python better, let’s break up the learning process into three modules:Elementary PythonThis is where you’ll learn syntax, keywords, loops data types, classes, exception handling, and functions.Advanced PythonIn Advanced Python, you’ll learn multi-threading, database programming (MySQL/ MongoDB), synchronization techniques and socket programming.Professional PythonProfessional Python involves knowing concepts like image processing, data analytics and the requisite libraries and packages, all of which are highly sophisticated and valued technologies.With a firm resolve and determination, you can definitely get certified with Python course!Some Tips To Keep In Mind While Learning PythonFocus on grasping the fundamentals, such as object-oriented programming, variables, and control flow structuresLearn to unit test Python applications and try out its strong integration and text processing capabilitiesPractice using Python’s object-oriented design and extensive support libraries and community to deliver projects and packages. Assignments aren’t necessarily restricted to the four-function calendar and check balancing programs. By using the Python library, programming students can work on realistic applications as they learn the fundamentals of coding and code reuse.

Rated 4.5/5
based on 12 customer reviews


Python in a Nutshell: Everything That You Need to Know

Python in a Nutshell: Everything That You Need to Know

Blog

The Ultimate Guide to Node.Js

By Susan May

IT professionals have always been in much demand, but with a Node.js course under your belt, you will be more sought after than the average developer. In fact, recruiters look at Node js as a major recruitment criterion these days.  Why are Node.js developers so sought-after, you may ask. It is because Node.js requires much less development time and fewer servers, and provides unparalleled scalability.In fact, LinkedIn uses it as it has substantially decreased the development time. Netflix uses it because Node.js has improved the application’s load time by 70%. Even PayPal, IBM, eBay, Microsoft, and Uber use it. These days, a lot of start-ups, too, have jumped on the bandwagon in including Node.js as part of their technology stack.The Course In BriefWith a Nodejs course, you learn beyond creating a simple HTML page, learn how to create a full-fledged web application, set up a web server, and interact with a database and much more, so much so that you can become a full stack developer in the shortest possible time and draw a handsome salary. The course of Node.js would provide you a much-needed jumpstart for your career.Node js: What is it?Developed by Ryan Dahl in 2009, Node.js is an open source and a cross-platform runtime environment that can be used for developing server-side and networking applications.Built on Chrome’s JavaScript runtime (V8 JavaScript engine) for easy building of fast and scalable network applications, Node.js uses an event-driven, non-blocking I/O model, making it lightweight and efficient, as well as well-suited for data-intensive real-time applications that run across distributed devices.Node.js applications are written in JavaScript and can be run within the Node.js runtime on different platforms – Mac OS X, Microsoft Windows, Unix, and Linux.What Makes Node js so Great?I/O is Asynchronous and Event-Driven: APIs of Node.js library are all asynchronous, i.e., non-blocking. It simply means that unlike PHP or ASP, a Node.js-based server never waits for an API to return data. The server moves on to the next API after calling it. The Node.js has a notification mechanism (Event mechanism) that helps the server get a response from the previous API call.Superfast: Owing to the above reason as well as the fact that it is built on Google Chrome’s V8 JavaScript Engine, Node JavaScript library is very fast in code execution.Single Threaded yet Highly Scalable: Node.js uses a single threaded model with event looping, in which the same program can ensure service to a much larger number of requests than the usual servers like Apache HTTP Server. Its Event mechanism helps the server to respond promptly in a non-blocking way, eliminating the waiting time. This makes the server highly scalable, unlike traditional servers that create limited threads to handle requests.No buffering: Node substantially reduces the total processing time of uploading audio and video files. Its applications never buffer any data; instead, they output the data in chunks.Open source: Node JavaScript has an open source community that has produced many excellent modules to add additional capabilities to Node.js applications.License: It was released under the MIT license.Eligibility to attend Node js CourseThe basic eligibility for pursuing Node training is a Bachelors in Computer Science, Bachelors of Technology in Computer Science and Engineering or an equivalent course.As prerequisites, you would require intermediate JavaScript skills and the basics of server-side development.CertificationThere are quite a few certification courses in Node Js. But first, ask yourself:Do you wish to launch your own Node applications or work as a Node developer?Do you want to learn modern server-side web development and apply it on apps /APIs?Do you want to use Node.js to create robust and scalable back-end applications?Do you aspire to build a career in back-end web application development?If you do, you’ve come to the right place!Course CurriculumA course in Node JavaScript surely includes theoretical lessons; but prominence is given to case studies, practical classes, including projects.  A good certification course would ideally train you to work with shrink-wrap to lock the node modules, build a HTTP Server with Node JS using HTTP APIs, as well as about important concepts of Node js like asynchronous programming, file systems, buffers, streams, events, socket.io, chat apps, and also Express.js, which is a flexible, yet powerful web application framework.Have You Decided Yet? Now that you know everything there is to know about why you should pursue a Node js course and a bit about the course itself, it is time for you to decide whether you are ready to embark on a journey full of exciting technological advancements and power to create fast, scalable and lightweight network applications.

Rated 4.5/5
based on 6 customer reviews


The Ultimate Guide to Node.Js

The Ultimate Guide to Node.Js

Blog

MIT’s new automated ML runs 100x faster than human Data Scientists

By Ruslan Bragin

According to Michigan State University and MIT, automated machine learning system analyses the data and deliver a solution 100x faster than one human. The automated machine learning platform which is known as ATM (Auto Tune Models) uses cloud-based, on demand computing to accelerate data analysis.

Researchers of MIT tested the system through open-ml.org, a collaborative crowdsourcing platform, on which data scientists collaborate to resolve problems. They found that ATM evaluated 47 datasets from the platform and the system was capable to deliver a solution that is better than humans. It took nearly 100 days for data scientists to deliver a solution, while it took less than a day for ATM to design a better-performing model.

«There are so many options,» said Ross, Franco Modigliani professor of financial economics at MIT, told MIT news. «If a data scientist chose support vector machines as a modeling technique, the question of whether she should have chosen a neural network to get better accuracy instead is always lingering in her mind.»

ATM searches via different techniques and tests thousands of models as well, analyses each, and offers more resources that solves the problem effectively. Then, the system exhibits its results to help researchers compare different methods. So, the system is not automating the human data scientists out of the process, Ross explained.

«We hope that our system will free up experts to spend more time on data understanding, problem formulation and feature engineering,» Kalyan Veeramachaneni, principal research scientist at MIT’s Laboratory for Information and Decision Systems and co-author of the paper, told MIT News.

Auto Tune Model is now made available for companies as an open source platform. It can operate on single machine, on-demand clusters, or local computing clusters in the cloud and can work with multiple users and multiple datasets simultaneously, MIT noted. «A small- to medium-sized data science team can set up and start producing models with just a few steps,» Veeramachaneni told MIT News.

Source: MIT Official Website

 

Rated 4.0/5
based on 20 customer reviews


MIT’s new automated ML runs 100x faster than human Data Scientists

MIT’s new automated ML runs 100x faster than human Data Scientists

What’s New

Follow Us On

Share on

Useful Links

Project description

Orange 3

Orange is a component-based data mining software. It includes a range of data
visualization, exploration, preprocessing and modeling techniques. It can be
used through a nice and intuitive user interface or, for more advanced users,
as a module for the Python programming language.

This is the latest version of Orange (for Python 3). The deprecated version of
Orange 2.7 (for Python 2.7) is still available (binaries and sources).

Installing with pip

To install Orange with pip, run the following.

# Install some build requirements via your system's package manager
sudo apt install virtualenv build-essential python3-dev

# Create a separate Python environment for Orange and its dependencies ...
virtualenv --python=python3 --system-site-packages orange3venv
# ... and make it the active one
source orange3venv/bin/activate

# Install Orange
pip install orange3

Starting Orange GUI

To start Orange GUI from the command line, run:

orange-canvas
# or
python3 -m Orange.canvas

Append --help for a list of program options.

Download files

Download the file for your platform. If you’re not sure which to choose, learn more about installing packages.

Source Distribution

Built Distributions

Понравилась статья? Поделить с друзьями:
  • Оперный театр руководство
  • Кардиоджет инструкция по применению схема наложения электродов
  • Sm ccr3035 инструкция на русском soundmax магнитола
  • Три регол инструкция по применению отзывы женщин после 40 лет
  • Мидзо инструкция по применению побочные действия