Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 10 of 10 total
Thread Undecided on Database Design for Collecting Sensor Data
Wed, Mar 20 2013 1:22 PMPermanent Link

Barry

I thought I'd bounce this problem off of you guys to see if you can come up with something. It is a bit long winded, but please bear with me.

I have a database that needs to collect and analyze data that is collected on a farm. The data is produced by a commercial control system that has device sensors to record: water consumption, temperature, humidity, fans, heaters etc..

This control system outputs the data as a CSV file at the end of the day and produces one record for every hour and this record has all of the readings for all of the devices. There are up to 10 temperature sensors, several fans and several heaters. I can import this data into the database into a table Barn_Data without any problem, with one record per CSV row (hour).

From the farmer's standpoint, all of the data can sit in this one table. That means it will have 10 sets of temperature columns (min,max,current for each temperature probe), 4 sets of fan durations, along with heater run times etc.. Importing the CSV data is quite fast and when the data is in a grid the farmer can easily compare the data from one hour to the next. So he can see the water consumption increase, heater run times, and all 10 temperature readings etc. from one hour to the next. I can also summarize the data by day if needed. This "flat file" table works fine. BTW, this table will be read-only and never changed once the data has been imported and the data "cleaned".

But from a database programmer's viewpoint, there should be a separate table for each type of device. The 10 sets of temperature columns should be moved to a temperature table Barn_Data_Temperature with a column indicating the temperature probe# and the Min,Max,Current temperature and the BD_Id (GUID) that points to the parent Barn_Data table that has the date, time, etc.. The other devices should have their own table, like Barn_Data_Heaters table, Barn_Data_Water table etc. and they would all have a BD_Id that points to the same parent row in the Barn_Data table.

Here is the rub. I see no easy way of merging all of these child table rows back into a single grid row so the farmer can compare say all temperature probe readings from one hour to the next in a grid. Sure I can have master/detail grids and that would be fast enough, but then the farmer has no means of comparing temperature readings from one hour to the next, as well as feed consumption along side it. In a month, there would be 24hx31 days=744 records. But multiply by the number of temperature probes (x10) and Heaters (x4) etc. and this amounts to a lot of data that needs to be merged back together to form single rows for that BD_Id (Barn Data hour id).

For the sake of the farmer I could leave all of the data in one table, but I also want several farms to use the same software and replicate the data into one central database. Having a different table structure for each farm's Barn_Data table is going to be a pain to sort out (each has a variable number of devices). That's why I'd like to have a table for each type of device sensor instead of throwing a variable number of sensor data into one large record.

1) So is there a fast solution to merging the child records back into a single hourly row?
2) Do I keep everything in one large Barn_Data table?
3) Do I have 2 sets of tables? One for the farmer and one for replication?

I couldn't find any ER diagram for monitoring equipment sensors on a farm or any other type of building. But I suspect they would have broken the data into separate sensor-type tables like I suggested here.

Any recommendations?
TIA

Barry
Wed, Mar 20 2013 2:47 PMPermanent Link

David Cornelius

Cornelius Concepts

Avatar

This is an excellent story problem to present in a Computer Science course
on database design!  Here are some of my thoughts.

At only 744 records per month, that's less than 9,000 per year.  It would
take 100 farms 10 years to generate 9 million rows. That's not too bad for a
table.  Maybe they're summarized or archived periodically, so there's not
really a worry about space or speed.

You could use Views to easily see or report on different groups of data.
But the variable number of devices is a good reason to break out parts of
the data to different tables.  The SQL will get a little complex, but
JOINing with detail tables for hourly comparison of temperatures and other
data points is possible.

One thing you mentioned strikes me as a little odd: "For the sake of the
farmer I could leave all the data in one table."  Does this mean that the
only part you play in this is getting the data into a database, then the
farmer has to do queries to see the data?  Are you building a web site or
program that presents sorting and filtering options to the farmer to use?
If so, it doesn't really matter to the farmer how the data is stored as long
as he can see the data in a meaningful way, right?

So by separating out temperature devices, feed consumption, and other data
sets to separate tables, you're only restricting the farmer from comparing
readings from one hour to the next until you figure out how to re-combine it
in the grid, right?

I say go with option 1.  We can help you with the SQL.  Smile

David Cornelius
Cornelius Concepts
Wed, Mar 20 2013 2:58 PMPermanent Link

Fernando Dias

Team Elevate Team Elevate

Barry,

If you want to design it as a scalable system, for any number and type of sensors, you are right: storing the data as a flat file as it is now is not the best solution. Also, I'd *not* break it in many tables, one for each sensor.

Instead I'd use a different approach:
A Sensors table listing all the reading devices and a SensorReadings containing all the reading for all the sensors, and then I'd dynamically generate the temporary tables for displaying and reporting using stored procedures or scripts, as needed.

The data scheme would be something like this:
  Sensors( SensorID, SensorName,  ... )
  SensorReadings(SensorID, Date, Time, Value, Unit, ... )


--
Fernando Dias
[Team Elevate]
Wed, Mar 20 2013 3:05 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Barry,

<< 1) So is there a fast solution to merging the child records back into a
single hourly row? >>

Well, I'm sure you know this already, but EDB *can* return live result sets
with queries like this:

SELECT CustNo,
(SELECT Company FROM Customer WHERE Customer.CustNo=Orders.CustNo) AS
Company
FROM Orders

So, you could, theoretically, merge in columns from multiple tables and
still get a fairly fast result set back.  Such columns work just like lookup
fields, so as long as the WHERE part of the sub-query is optimized, then the
lookups will be fast.

Tim Young
Elevate Software
www.elevatesoft.com
Wed, Mar 20 2013 3:59 PMPermanent Link

Barry


David,

Thanks for the quick reply.

<This is an excellent story problem to present in a Computer Science course
on database design!>

I must have been away that day. <vbg>

<  Here are some of my thoughts.

At only 744 records per month, that's less than 9,000 per year.  It would
take 100 farms 10 years to generate 9 million rows. That's not too bad for a
table.  Maybe they're summarized or archived periodically, so there's not
really a worry about space or speed.

You could use Views to easily see or report on different groups of data.
But the variable number of devices is a good reason to break out parts of
the data to different tables.  The SQL will get a little complex, but
JOINing with detail tables for hourly comparison of temperatures and other
data points is possible.

One thing you mentioned strikes me as a little odd: "For the sake of the
farmer I could leave all the data in one table."  Does this mean that the
only part you play in this is getting the data into a database, then the
farmer has to do queries to see the data?  Are you building a web site or
program that presents sorting and filtering options to the farmer to use?
If so, it doesn't really matter to the farmer how the data is stored as long
as he can see the data in a meaningful way, right?>

I meant to say the farmer would like to "see" all of the data on one row for the hour.
And you're right, he doesn't care how the data is stored.


<So by separating out temperature devices, feed consumption, and other data
sets to separate tables, you're only restricting the farmer from comparing
readings from one hour to the next until you figure out how to re-combine it
in the grid, right?>

Correct

<I say go with option 1.  We can help you with the SQL.  Smile

Thanks. My main concern doing it this way is speed. It will take anywhere from 10 to 20 sub-selects per hourly row (10 temperature probes and about 10 other sensors). The farmer will want to see 6 weeks worth of data at a time, so that is approx 1,000 hours. Multiply that by 20 sub-selects and that is a heck of a lot of I/O compared to keeping it all in one table.

So I'm torn between doing it the right way (separate tables) or the fast way (single table).

I will read over the other replies and mull it over some more. I'm sure there is a way to do it quickly. I have some test data so I will set the tables up with data and see how fast it is using sub-selects.

Barry
Wed, Mar 20 2013 4:57 PMPermanent Link

Barry

<Instead I'd use a different approach:
A Sensors table listing all the reading devices and a SensorReadings containing all the reading for all the sensors, and then I'd dynamically generate the temporary tables for displaying and reporting using stored procedures or scripts, as needed.

The data scheme would be something like this:
  Sensors( SensorID, SensorName,  ... )
  SensorReadings(SensorID, Date, Time, Value, Unit, ... )

>

Fernando,

That is an interesting way of looking at it.

Each type of sensor will be recording different units of information with one or more of the following units:
Minutes, Min, Max, Kg, Percent, Count,

Temperature would record: SensorID, Current_Temperature (float), Max_Temperature (float), Min_Temperature (float)

Water would record: SensorID, Water_Consumption_L (float), Water_Consumption_Total_L (float)
Heater would record: SensorID, Run_Time_Minutes (float), Run_Time_Total_Minutes (float)

I gather from what you're saying, is create a table SensorReadings with enough columns to handle the maximum number of readings a sensor will record, like:
SensorReadings Table:
-------------------------------
BD_Id (GUID) fk to Barn_Data.BD_Id
SensorId (VarChar)
SensorType: Char(1): 'T'=Temperature, 'W'=Water, 'F'=Fan, 'H'=Heater, 'U'=Humidity etc..
Float1 (Float)
Float2 (Float)
Float3 (Float)
Num1 (Integer)

Then create a view for each Sensor Type, like vTemperature that would look only at SensorReadings with SensorType='T' (I added SensorType to this table to avoid a join to Sensors table). I also added a BD_Id column so I could quickly get all sensor readings from SensorReadings for a particular hour (BD_Id).

This will certainly be faster because I could create a temporary table with the sensor columns by examining the Sensor rows table and fill it with the rows from SensorReadings. This will trade the 10-20 subselects that I would have had to use with separate tables with a temporary table filled with those same readings that were retrieved with one select statement "Select * from SensorReadings where BD_Id=:BDId".

This approach is worth looking into. I have some test data and will see how difficult it is.

Thanks
Barry
Wed, Mar 20 2013 5:14 PMPermanent Link

Barry

<So, you could, theoretically, merge in columns from multiple tables and
still get a fairly fast result set back.  Such columns work just like lookup
fields, so as long as the WHERE part of the sub-query is optimized, then the
lookups will be fast.>

Yes, I've used sub-selects a lot. I'm a bit concerned using 15 to 20 of them in a Select statement though. They tend to slow things down after the first 4 or 5. BTW, the query doesn't have to be sensitive since the user won't be permitted to make changes to it.

I may try 15 or 20 table joins instead. I may load the sensor readings into a temporary memory table and slap an index on it to see if the speed is ok. I will try Fernando's approach with the table joins first since everything is stored in one SensorReadings table.

Barry
Wed, Mar 20 2013 8:27 PMPermanent Link

Fernando Dias

Team Elevate Team Elevate

Barry,

Yes, that's what I had in mind.
However I didn't know that each sensor (or device, I'm not sure how to call it) could generate more then one value per reading, but it's still applicable as long as you can safely assume a maximum for the number of values in a reading and that number is not too big. The Sensors or Devices table would have the list of the sensor names, values meaning and units of measure.

If it's not possible or reasonable to set such a maximum, then an alternative model would be to think of each one of the individual values as an independent sensor.

--
Fernando Dias
[Team Elevate]
Wed, Mar 20 2013 10:35 PMPermanent Link

Raul

Team Elevate Team Elevate

On 3/20/2013 1:22 PM, Barry wrote:
> 3) Do I have 2 sets of tables? One for the farmer and one for replication?

Barry,

Some good suggestions from others but i'd go for hybrid approach as you
indicated :
- proper tables for  replication
- single table for farmer to look at

If your import process remains same - one import at the end of day - and
it's read-only data then you can easily populate both sets at the import
time and not really "lose" any functionality.

I don't know what your application requirements are but you can spend
your time on other areas of the app - getting the sql done and
optimizing it and supporting it IMHO will take up more of your time than
just maintaining the "farmer" table.

it's what many paralytics solutions end up doing - flattening the data
out for analysis and easy reporting.


Raul
Thu, Mar 21 2013 2:55 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Barry,

<< Yes, I've used sub-selects a lot. I'm a bit concerned using 15 to 20 of
them in a Select statement though. They tend to slow things down after the
first 4 or 5. >>

I'd have to see an actual example, but they shouldn't be much slower than
any other type of generated column since they'll be generated entirely on
the server.

<< BTW, the query doesn't have to be sensitive since the user won't be
permitted to make changes to it. >>

I understand - I was just trying to give you a way of instantly
materializing a lot of "joins" without having to process the entire master
table.

If you have any other questions, please let me know.

Tim Young
Elevate Software
www.elevatesoft.com
Image